如何評(píng)估惡意使用先進(jìn)人工智能系統(tǒng)的可能性 How to Assess the Likelihood of Malicious Use of Advanced Al Systems_第1頁
如何評(píng)估惡意使用先進(jìn)人工智能系統(tǒng)的可能性 How to Assess the Likelihood of Malicious Use of Advanced Al Systems_第2頁
如何評(píng)估惡意使用先進(jìn)人工智能系統(tǒng)的可能性 How to Assess the Likelihood of Malicious Use of Advanced Al Systems_第3頁
如何評(píng)估惡意使用先進(jìn)人工智能系統(tǒng)的可能性 How to Assess the Likelihood of Malicious Use of Advanced Al Systems_第4頁
如何評(píng)估惡意使用先進(jìn)人工智能系統(tǒng)的可能性 How to Assess the Likelihood of Malicious Use of Advanced Al Systems_第5頁
已閱讀5頁,還剩19頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

CenterforSecurityandEmergingTechnology|1

ExecutiveSummary

Policymakersaredebatingtherisksthatnewadvancedartificialintelligence(AI)

technologiescanposeifintentionallymisused:fromgeneratingcontentfor

disinformationcampaignstoinstructinganovicehowtobuildabiologicalagent.

Becausethetechnologyisimprovingrapidlyandthepotentialdangersremainunclear,assessingriskisanongoingchallenge.

Malicious-userisksareoftenconsideredtobeafunctionofthelikelihoodandseverityofthebehaviorinquestion.WefocusonthelikelihoodthatanAItechnologyis

misusedforaparticularapplicationandleaveseverityassessmentstoadditionalresearch.

TherearemanystrategiestoreduceuncertaintyaboutwhetheraparticularAIsystem

(callitX)willlikelybemisusedforaspecificmaliciousapplication(callitY).We

describehowresearcherscanassessthelikelihoodofmalicioususeofadvancedAIsystemsatthreestages:

1.Plausibility(P)

2.Performance(P)

3.Observeduse(Ou)

PlausibilitytestsconsiderwhethersystemXcandobehaviorYatall.Performance

testsaskhowwellXcanperformY.InformationaboutobservedusetrackswhetherXisusedtodoYintherealworld.

Familiaritywiththesethreestagesofassessment—includingthemethodsusedateachstage,alongwiththeirlimitations—canhelppolicymakerscriticallyevaluate

claimsaboutAImisusethreats,contextualizeheadlinesdescribingresearchfindings,andunderstandtheworkofthenewlycreatednetworkofAIsafetyinstitutes.

ThisIssueBriefsummarizesthekeypointsin:JoshA.GoldsteinandGirishSastry,

“The

PPOuFramework:AStructuredApproachforAssessingtheLikelihoodofMalicious

UseofAdvancedAISystems,

”ProceedingsoftheAAAI/ACMConferenceonAI,Ethics,andSociety7,no.1(2024):503–518.

CenterforSecurityandEmergingTechnology|2

Introduction

Concernsaboutbadactorsintentionallymisusingadvancedartificialintelligence(AI)systemsareprevalentandcontroversial.TheseconcernsareprevalentastheyreceivewidespreadmediaattentionandarereflectedinpollsoftheAmericanpublicaswellasinpronouncementsandproposalsbyelectedofficials

.1

Yettheyarecontroversialbecauseexperts—bothinsideandoutsideofAI—expresshighlevelsofdisagreementabouttheextenttowhichbadactorswillmisuseAIsystems,howusefulthese

systemswillbecomparedtonon-AIalternatives,andhowmuchcapabilitieswillchangeinthecomingyears

.2

ThedisagreementaboutmisuserisksfromadvancedAIsystemsisnotmerely

academic.Claimsaboutriskareoftencitedtosupportpolicypositionswithsignificantsocietalimplications,includingwhethertomakemodelsmoreorlessaccessible,

whetherandhowtoregulatefrontierAIsystems,andwhethertohaltdevelopmentofmorecapableAIsystems

.3

Ifviewsofmisuseriskswillinformpolicy,itiscriticalfor

policymakerstounderstandhowtoevaluatemalicious-useresearch.

Inanewpaper

“ThePPOuFramework:AStructuredApproachforAssessingthe

LikelihoodofMaliciousUseofAdvancedAISystems,

”publishedintheProceedingsoftheSeventhAAAI/ACMConferenceonAI,Ethics,andSociety,weprovideaframeworkforthinkingthroughthelikelihoodthatanadvancedAIsystem(callitX)willbe

misusedforaparticularmaliciousapplication(callitY)

.4

Theframeworklaysoutthreedifferentstagesforassessingthelikelihoodofmalicioususe:

1.Plausibility(P):CansystemXperformmaliciousbehaviorYevenonce?

2.Performance(P):HowwellcansystemXperformmaliciousbehaviorY?

3.Observeduse(Ou):IssystemXusedformaliciousbehaviorYintherealworld?

Onceapotentialmisuseriskhasbeenidentified,researcherscaninvestigatetheriskateachstageoutlinedinFigure1.Thefigurealsosummarizesthekeymethodologiesandchallengesateachstage.

CenterforSecurityandEmergingTechnology|3

Figure1.ThePPOuFrameworkSummary

Researchateachofthesethreestagesaddressesdifferentformsofuncertainty.For

example,whiledemonstrationsattheplausibilitystagemaybeabletoshowthat

systemXcanbeusedforbehaviorY(orabehaviorsimilartoY)once,theywillleaveuncertaintyabouthowusefulXwouldbeforpotentialbadactors.Riskassessmentsattheperformancestagecanhelpmodelthemarginalutilityforbadactors,butactual

useofXbybadactorsmaydifferfromresearchexpectations.Observedusecantrack

CenterforSecurityandEmergingTechnology|4

actualapplicationstorightsizeexpectations,butitwillnotdeterminehowfuturesystemscouldbemisusedorwhetherXwillbeusedforvariantsofYinthefuture.

Wehopethatbylayingoutthesestages,wewillprovidepolicymakerswithabetterunderstandingofthetypesofuncertaintyaboutmalicioususeandwhereresearch

can—andcannot—plugin.InFigure2,weprovideexamplesofthetypesofquestionsresearcherscouldaskateachstagefromthreeriskareas:politicalmanipulation,

biologicalattacks,andcyberoffense.

CenterforSecurityandEmergingTechnology|5

ThePPOuFramework

Figure2.StagesofthePPOuFrameworkandExampleQuestions

Stage

Examplefrompolitical

manipulation

Examplefrom

biologicalattacks

Examplefromcyberoffense

Plausibility

Couldamultimodalagentgenerateanddistributeashort

videoofpartisansinterferingintheelectoralprocess?

*

Couldachatbot

guidea(resourced)undergradthroughtheprocessof

creatingaCategoryBpotential

bioterrorism

agent?

5

Couldalarge-language-model-basedsoftware

agentidentifyandproduce(butnotnecessarily

deliver)aworkingexploitforawidelyusedpieceofsoftware?

Performance

Howrealistic,reliable,andcost-effectivearemultimodalmodelsatgeneratingand

distributingvideosof

partisansattemptingtointerfereinthe

electionprocess?

Howmuchupliftdoesthechatbotprovideover

existingbiologicaldesigntools?

Howmuchdoesitcostto

operatetheagentto

producetheexploit

comparedtoasimilarlyskilledhuman?

Observeduse

Dopeopleuse

multimodalmodelstogenerateand

distributevideosof

partisansattemptingtointerfereinthe

electoralprocess,inpractice?

Dorequest-

responselogs

indicatethatauserisapplyinga

chatbottoguide

themthrough

creatingapotentialbioterrorismagent?

Istherechatteroncriminalforumsthatpeopleare

experimentingwithsuchanagent?

*Onagents,see:HelenToner,JohnBansemer,KyleCrichtonetal.,“ThroughtheChatWindowandInto

theRealWorld:PreparingforAIAgents,”CenterforSecurityandEmergingTechnology,October2024,

/publication/through-the-chat-window-and-into-the-real-world-preparing-

for-ai-agents/.

CenterforSecurityandEmergingTechnology|6

Stage1:Plausibility:CansystemXperformmaliciousbehaviorYevenonce?

ThesimplestwaytotestwhetherthereisariskofsystemXbeingusedformaliciousbehaviorYistoseeifXcandoY,justonce.Red-teamersandstresstestersadoptanadversary’smindsetandprobeanAIsystemfor“identificationofharmfulcapabilities,outputs,orinfrastructurethreats.

”6

Ifamodeldoesnotproduceharmfulbehavioronthefirsttry,thenextstepistoiterate.Researchersusedifferenttechniques,includingimprovingprompts(theinputfedtothemodel,suchasinstructionsorexamples)orfine-tuning(asmallamountofadditionaltraining)amodelforthespecificbehavior.

IfXstillfailstoexhibitY,despiteemployingvarioustechniquesandtricks,thequestion

ThesimplestwaytotestwhetherthereisariskofsystemXbeingusedformaliciousbehaviorYis

toseeifXcandoY,justonce.

naturallybecomes:Howlongshouldonecontinuetrying?Whileresearcherscan

demonstratethatsystemXcanbeusedformalicioususeYbyshowinganexample,

provingthenegative(thatsystemXcannotdoY)ismorechallenging.

BecauseAImodelsareoftengeneral-purposeandourabilitytopredicttheir

capabilitiesisstilladvancing,wemaynotknowwhethersystemXcannotperformY,orwhetherthepromptingstrategiesusedhavebeeninsufficient.Thisisknownasthe

“capabilityelicitationproblem.”Forexample,onepaperfoundthatsimplyprepending“Takeadeepbreath”beforearequestedtaskimprovedperformance

.7

AnalystsmaythusconcludethatasystemcouldplausiblydoYifitgetsclose,withinacertain

marginoferror(knownasa“safetymargin”),toaccountforpossiblegainsfrombetterelicitationtechniques

.8

Thedeterminationaboutwhatqualifiesascloseenoughisa

matterofjudgment.

Toscaleupred-teamingefforts,AIdeveloperscanusebothhumansandmachines.LeadingAIlabsarehiringexternalcontractorstored-teamtheirmodels,allowing

themtoaugmenttheexpertise(andlaborhours)theypossessin-house

.9

ResearchersarealsodevelopingwaystouseAImodelstored-teamothermodels,whichisa

promisingdirectionforfutureresearch

.10

Stage2:Performance:HowwellcansystemXperformmaliciousbehaviorY?

Showingthatasystemcandothemaliciousbehavioronce(Stage1)doesnotmeanitisnecessarilyausefultoolfordoingso.Plausibilitystillleavesalotofuncertainty,

includingaboutthequalityoftheoutput,thereliabilityofthesystem,andhowusefulitiscomparedtoalternativesthatmayhaveexistedbefore.Asaresult,researchat

CenterforSecurityandEmergingTechnology|7

Stage2focusesonaddressingtheseandrelatedquestions,aimingtoreduceuncertaintyabouttheutilityofthesysteminquestion,oftenthroughstaticbenchmarks,experiments,ormodelingmarginalutility.

First,similartotakingawrittentest,researcherswilltestAImodelsagainst

predeterminedsetsofquestions.Forexample,canamodelrecognizeimages?SolvePhD-levelmathproblems?Answerquestionsaboutthelaw?Ifresearchershavea

standardizedsetofquestions,theycancontinuallytestnewmodelsagainstthat

benchmarkasmodelsaredeveloped

.11

Thiscanprovidecomparabilitybetween

modelsatrelativelylowcost.Recently,researchershavealsobeenbuilding

benchmarksforpotentiallyharmfulapplications

.12

Inpractice,however,static

benchmarkscanbedifficulttocreate.Sometimestheyare“polluted”becausethe

modelwasalreadyprovidedwiththeansweraspartofitstrainingset

.13

Othertimes,constructingthemcanbelabor-intensive,necessitatespecializedknowledge,or

requireaccesstoclassifiedmaterial.

Asecondapproachistoconductexperiments,

deliberatelyintroducinganAIsystemorpieceofAI-generatedcontentinanenvironmentto

determineitseffectonoutcomesofinterest.ThiscouldrangefromusingAIinpenetration-testingexercisestousingAItoconvincepeopleagainstconspiracybeliefs

.14

Inlabandsurvey

experiments,researcherscanstudytheeffectsofdifferenttreatments,recruitrespondentsusingexistingpools,andstraightforwardlyensure

Stage2aimstoreduce

uncertaintyabouttheutilityofthesysteminquestion,oftenthroughstatic

benchmarks,experiments,ormodelingmarginal

utility.

informedconsent.However,forassessingmalicious-userisks,researcherswillstill

facelimitationsbecauseofthedutytominimizeharmtorespondents.ThisisespeciallyacuteforfieldexperimentsthattesttheeffectsofanAIsystemintherealworld.

Finally,researchersmaytestforuplift—thatis,howusefulatooliscomparedtoasetofalternatives

.15

Ifasystemcanreliablyproduceinstructionsfordesigninga

bioweapon,butaGooglesearchcoulddothesame,thentheupliftislimited.These

marginalutilitytestsrequireestablishingarelevantsetofalternatives(whichwillvarybasedonthethreatmodel)andoutcomesofinterest.Forexample,ifthegoalisto

assesswhetherlanguagemodelswillbeusefulforpropaganda,uplifttestswould

requireestablishingbaselines(human-writtenpropaganda)andoutcomesofinterest,suchasthecostofrunningacampaignorthenumberofpeoplepersuadedtotakeanaction

.16

CenterforSecurityandEmergingTechnology|8

Performancetestingcouldimproveinseveralways.First,researcherscouldtestfortheequivalentof“scalinglaws”inthemalicious-usedomain

.17

Inotherwords,howmuchriskierinamalicious-usedomaindomodelsbecomewithcertaincapability

improvements(orscaleincreases)?Fromaninstitutionalperspective,thefieldcan

continuetodeveloparrangementsthatminimizeincentiveissues.AIlabsmayhavethebestcapabilityelicitationtechniques,buttheyalsohaveincentivestosandbagornotthoroughlytesttheirownsystems

.18

Inthefuture,governmententitiessuchasAI

safetyinstitutescouldconductasubsetofmalicious-useriskstoensuretestingoccurs

.19

Stage3:ObservedUse:IssystemXusedformaliciousbehaviorYintherealworld?

InStages1and2,researcherscantestwhethersystemXcanbeusedformaliciousbehaviorYandinvestigatehowusefulXmaybe.Whileforecastspriortodeployment

Theobservedusestageshiftsawayfrom

focusingonprojectedscenariostodiscoveringhowbadactorsmisuse

AIsystemsintherealworld.

estimatehowlikelysystemXistobemisused,

researchexpectationsandactualmisusemaydiverge.Policymakersmustrecognizethatpre-deployment

researchmaymisestimaterisksduetocognitivebiases(e.g.,analystsprojectingtheirownassumptionsontoadversaries)orunforeseencapabilities(e.g.,emergentabilitiesofAIsystemsdiscoveredafterdeployment).Historicalexamples—suchasthemisjudgedthreatsofcyberattacksinthe1990s—highlighthowanticipatedriskscandifferfromactualmisuse

.20

Theobservedusestageshiftsawayfromfocusingonprojectedscenariosto

discoveringhowbadactorsmisuseAIsystemsintherealworld.Thisisoften

challenging,asactorsmisusingtoolsmaydeliberatelyobscuretheiractivitiesduetoreputationalrisks,legalconcerns,orfearsthatexposurecouldunderminetheir

effectiveness.

OnemethodforuncoveringmisuseismonitoringbyAIprovidersthemselves.For

example,companiesthatmaketheirnewsystemsavailablethroughanapplication

programminginterfacecouldmonitorrequestsandresponses.OpenAIhasrecentlyreleasedseveralreportsdescribinginfluenceoperationsmisusingitstools

.21

MonitoringbyAIproviderscanbeaneffectivestrategy,becauseitcanexposemisuseearlyinanoperation—forexample,aftercovertpropagandistsbegincreatingcontent,butbeforetheybuildlargefollowings.However,thisstrategyalsofacestrade-offs

relatedtouserprivacy.

CenterforSecurityandEmergingTechnology|9

AsecondroutefordiscoverycomesfromoutsidetheAIcompanies.Open-source

researchersandjournalistscanuncovertheuseofXformaliciousbehaviorY.PotentialroutestodiscoveryincludefindingevidenceofbadactorsusingAIintheirworkflows,interviewingthemtoaskhowtheyuseadvancedAIsystems,monitoringdiscussionsoncriminalforums,andmore.Thenewsoutlet404Mediahasuncoveredarangeof

applicationsofAIonline—includingspamandscams—demonstratingtheroleofjournalistswhocloselytrackonlinedevelopments

.22

Last,researcherscandevelopincidentdatabasestomovebeyondsinglecasestudiesandbetterunderstandpatternsofabuse.TheAIIncidentDatabaseandthePoliticalDeepfakesIncidentsDatabasearetwoongoingefforts.ThosebuildingAIincident

databasescandrawvaluablelessonsfromfieldswithestablishedincidentreportingsystems,suchasairlinesafety,includingconsiderationsofthetrade-offsbetweenvoluntaryandmandatorydisclosure

.23

BecausetheapplicationofXformalicioususeYmaybeintentionallyhidden,theobservedusestagefacesseverallimitations.First,

observationaldataaboutmisusemaynotbe

representativeofthebroaderuniverseofcases,leadingtofaultyconclusionsaboutareasthat

Policymakersshouldbe

carefulnottoover-index

onthemisusethatismosteasilycountable.

requireheightenedattention.Policymakersshouldbecarefulnottoover-indexonthemisusethatismosteasilycountable.Furthermore,evenifobservedusetodayis

representative,itmaynotprojectintothefuture.Newcapabilityelicitationtechniquesorimprovementsinmodelcapabilitiescanleadtosubstantiallydifferentmisusesofsubsequentgenerationsofsystems.Futureeffortscouldempowerexternal

researcherstoworkwithAIcompaniestobetterunderstandmisuse,developincreasinglycapableclassifiersformalicioususe,andscalemonitoringefforts.

CenterforSecurityandEmergingTechnology|10

Conclusion

Eachstageoftheframework—plausibility,performance,andobserveduse—attemptstoreduceuncertaintyaboutthelikelihoodofmisuseofanadvancedAIsystem.Canthemodelperformtheharmfulbehavior,justonce?Howwelldoesitdoso,andhow

usefulisittobadactors?Whatistheexistingevidenceofbadactorsmisusingthetool,orsimilarones,forthisapplicationintherealworld?

Forpolicymakers,thesequestionscanbeusefulwhenencounteringclaimsabout

misuserisks.Forexample,imaginecomingacrossaheadlinethatreads:“AIChatbotsCanGiveInstructionsforCreatingBioweapons.”UsingthePPOuFramework,a

discerningpolicymakermayaskwhetherthatisaplausibilityassessment(e.g.,red-teamingfoundinstructionsonce)oraresultofperformancetesting(e.g.,researchersfoundthesystemcouldgeneratevalidinstructionsreliably).Thepolicymakerthen

mightsearchforfurtherinformation:Howgoodisthechatbot,andcomparedtowhatbaseline?Justbecauseasystemcanbeusedforaparticularmaliciouspurposedoesnotmeanitwillbeintherealworld.PolicymakerscanusethePPOuFrameworkasaguide,whilerecognizingthatsomedegreeofuncertaintyaboutthelikelihoodof

malicioususewillalwaysremain.

Thediversesetofmethodologiesalsohighlightsthatawiderangeofexpertscan

contribute.AsAImodelsbecomeadvancedandhavewiderapplications,buildingupariskassessmentecosystemwillgrowmoreimportant.TheU.S.governmentshould

seektheadviceof,andprovidefundingsupportfor,researcherswithdifferent

substantiveexpertise(suchasmisusedomainsofinterest)aswellasdifferent

methodologicaltraining(forexample,machinelearningorhuman-subject

experiments).IfthenetworkofAIsafetyinstitutesprogresses,it,too,willbestrongerforsolicitingcooperationfromalargetent.

CenterforSecurityandEmergingTechnology|11

Authors

JoshA.GoldsteinisaresearchfellowatGeorgetownUniversity’sCenterforSecurityandEmergingTechnology(CSET),whereheworksontheCyberAIproject.

GoldsteiniscurrentlyonrotationasapolicyadvisorattheCybersecurityand

InfrastructureSecurityAgency(CISA)underanInterdepartmentalPersonnelActagreementwithCSET.HecompletedthisworkbeforestartingatCISA.Theviewsexpressedaretheauthor’sownpersonalviewsanddonotnecessarilyreflecttheviewsofCISAortheDepartmentofHomelandSecurity.

GirishSastryisanindependentpolicyresearcher.

Reference

Thispolicymemosummarizesanoriginalresearchpaper:JoshA.GoldsteinandGirishSastry,

“ThePPOuFramework:AStructuredApproachforAssessingtheLikelihoodof

MaliciousUseofAdvancedAISystems,

”ProceedingsoftheAAAI/ACMConferenceonAI,Ethics,andSociety7,no.1(2024):503-518.

Acknowledgments

Forresearchassistancesupport,wethankAbhiramReddy.Forfeedbackonthe

underlyingpaper,wethankLamaAhmad,MarkusAnderljung,JohnBansemer,EdenBeck,RosieCampbell,DerekChong,JessicaJi,IgorMikolic-Torreira,AndrewReddie,ChrisRohlf,ColinShea-Blymyer,WeiyanShi,TobyShevlane,ThomasWoodside,andparticipantsattheNLPSoDaSConference2023.

?2025bytheCenterforSecurityandEmergingTechnology.ThisworkislicensedunderaCreativeCommonsAttribution-NonCommercial4.0InternationalLicense.

Toviewacopyofthislicense,visit

/licenses/by-nc/4.0/.

DocumentIdentifier:doi:10.51593/20240042

CenterforSecurityandEmergingTechnology|12

Endnotes

1TaylorOrthandCarlBialik,“MajoritiesofAmericansAreConcernedabouttheSpreadofAIDeepfakesandPropaganda,”YouGov,September12,2023,

/technology/articles/46058-

majorities-americans-are-concerned-about-spread-ai;

OfficeofCongressmanDonBeyer,“Ross,BeyerIntroduceLegislationtoRegulateArtificialIntelligenceSecurity,MitigateRiskIncidents,”newsrelease,September23,2024,

/news/documentsingle.aspx?DocumentID=6304.

2SayashKapooretal.,“OntheSocietalImpactofOpenFoundationModels,”arXivpreprint

arXiv:2403.07918

(2024),

/abs/2403.07918;

YoshuaBengioetal.,“InternationalScientificReportontheSafetyofAdvancedAI(InterimReport),”arXivpreprint

arXiv:2412.05282

(2024),

/abs/2412.05282.

3IreneSolaiman,“TheGradientofGenerativeAIRelease:MethodsandConsiderations,”inFAccT:

Proceedingsofthe2023ACMConferenceonFairness,Accountability,andTransparency(NewYork:

AssociationforComputingMachinery,2023),

/10.1145/3593013.3593981;

Yoshua

Bengioetal.,“ManagingExtremeAIRisksAmidRapidProgress,”Science384,no.6698(2024):842–

845,

/doi/10.1126/science.adn0117;

YoshuaBengioetal.,“PauseGiantAI

Experiments:AnOpenLetter,”FutureofLifeInstitute,March22,2023,

/open-

letter/pause-giant-ai-experiments/.

4JoshA.GoldsteinandGirishSastry,“ThePPOuFramework:AStructuredApproachforAssessingtheLikelihoodofMaliciousUseofAdvancedAISystems,”ProceedingsoftheSeventhAAAI/ACM

ConferenceonAI,Ethics,andSociety7,no.1(2024):503–518,

/10.1609/aies.v7i1.31653.

5“PotentialBioterrorismAgents,”DepartmentofMolecularVirologyandMicrobiology,BaylorCollege

ofMedicine,accessedNovember22,2024,

/departments/molecular-virology-and-

microbiology/emerging-infections-and-biodefense/potential-bioterrorism-agents.

6“IssueBrief:WhatIsRedTeaming?”(FrontierModelForum,October27,2023),

/updates/red-teaming/.

7ChengrunYangetal.,“LargeLanguageModelsasOptimizers,”12thInternationalConferenceonLearningRepresentations(ICLR2024),

/pdf?id=Bb4VGOWELI.

8MaryPhuongetal.,“EvaluatingFrontierModelsforDangerousCapabilities,”arXivpreprint

arXiv:2403.13793

(2024),

/abs/2403.13793.

9“OpenAIRedTeamingNetwork,”O(jiān)penAI,September19,2023,

/index/red-teaming-

network/.

10AlexBeuteletal.,“DiverseandEffectiveRedTeamingwithAuto-generatedRewardsandMulti-stepReinforcementLearning,”arXivpreprint

arXiv:2412.18693

(2024),

/abs/2412.18693.

11“BrowseState-of-the-Art,”PaperswithCode,accessedNovember22,2024,

/sota.

CenterforSecurityandEmergingTechnology|13

12See,forinstance,ManishBhattetal.,“CyberSecEval2:AWide-RangingCybersecurityEvaluationSuiteforLargeLanguageModels,”arXivpreprint

arXiv:2404.13161

(2024),

/abs/2404.13161.

13AndyK.Zhangetal.,“LanguageModelDevelopersShouldReportTrain-TestOverlap,”arXivpreprint

arXiv:2410.08385

(2024),

/abs/2410.08385.

14ThomasH.Costello,G

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論