【谷歌(Google)】2025負責(zé)任人工智能進展報告 Responsible AI Progress Report_第1頁
【谷歌(Google)】2025負責(zé)任人工智能進展報告 Responsible AI Progress Report_第2頁
【谷歌(Google)】2025負責(zé)任人工智能進展報告 Responsible AI Progress Report_第3頁
【谷歌(Google)】2025負責(zé)任人工智能進展報告 Responsible AI Progress Report_第4頁
【谷歌(Google)】2025負責(zé)任人工智能進展報告 Responsible AI Progress Report_第5頁
已閱讀5頁,還剩28頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

Google

ResponsibleAIProgressReport

PublishedinFebruary2025

2

Foreword

AIisatransformationaltechnologythatoffersbothauniqueopportunitytomeetourmission,andthechancetoexpand

scientificdiscoveryandtacklesomeoftheworld’smostimportantproblems.AtGooglewebelieveit’scrucialthat

wecontinuetodevelopanddeployAI

aroundtheworldcanbenefitfromits

extraordinarypotentialwhileatthesametimemitigatingagainstitspotentialrisks.

In2018,wewereoneofthefirstintheindustrytoadoptAIPrinciples,and

sincethen,we’vepublishedannual

informationfromourlatestresearchand

practiceonAIsafetyandresponsibilitytopics.Itdetailsourmethodsforgoverning,mapping,measuring,andmanagingAIrisksalignedtotheNISTframework,aswellasupdateson

howwe’reoperationalizingresponsibleAIinnovationacrossGoogle.Wealsoprovide

frameworksemergingfromothercompaniesandacademicinstitutions.Ourupdated

AIPrinciples—centeredonboldinnovation,responsibledevelopment,andcollaborativepartnership—reflectwhatwe’relearningasAIcontinuestoadvancerapidly.

responsibly,withafocusonmakingsurethatpeople,businesses,andgovernments

AIresponsibilityreportsdetailingourprogress.Thisyear’sreportshares

morespecificinsightsandbestpracticesontopicsrangingfromourrigorousredteamingandevaluationprocessestohowwemitigate

AsAItechnologyanddiscussionsaboutitsdevelopmentandusescontinuetoevolve,wewillcontinuetolearnfromourresearch

riskusingtechniques,includingbettersafetytuningandfilters,securityandprivacy

controls,provenancetechnologyinour

products,andbroadAIliteracyeducation.

andusers,andinnovatenewapproachestoresponsibledevelopmentanddeployment.Aswedo,weremaincommittedtosharingwhatwelearnwiththebroaderecosystemthrough

OurapproachtoAIresponsibilityhasevolvedovertheyearstoaddressthedynamicnatureofourproducts,theexternalenvironment,

andtheneedsofourglobalusers.Since

thepublicationofreportslikethis,andalsothroughcontinuousengagement,discussion,andcollaborationwiththewidercommunitytohelpmaximizethebenefitsofAIforeveryone.

2018,AIhasevolvedintoageneral-purpose

LaurieRichardson

technologyuseddailybybillionsofpeopleandcountlessorganizationsandbusinesses.

VicePresident,Trust&Safety,Google

Thebroadestablishmentofresponsibility

frameworkshasbeenanimportantpartof

thisevolution.We’vebeenencouragedby

progressonAIgovernancecomingfrom

bodiesliketheG7andtheInternational

OrganizationforStandardization,andalso

3

SummaryofourresponsibleAIapproach

WehavedevelopedanapproachtoAIgovernancethatfocusesonresponsibilitythroughouttheAIdevelopmentlifecycle.ThisapproachisguidedbyourAIPrinciples,whichemphasizeboldinnovation,responsibledevelopment,andcollaborativeprogress.

OurongoingworkinthisareareflectskeyconceptsinindustryguidelinesliketheNISTAIRiskManagementFramework.

GovernMapMeasureManage

OurAIPrinciplesguideourdecision-makingand

informthedevelopmentofourdifferentframeworksandpolicies,includingtheSecureAIFramework

forsecurityandprivacy,andtheFrontierSafetyFrameworkforevolvingmodelcapabilitiesandmitigations.Additionalpoliciesaddressdesign,safety,andprohibiteduses.

Ourpre-andpost-launchprocessesensure

alignmentwiththesePrinciplesandpolicies

throughclearrequirements,mitigationsupport,andleadershipreviews.Thesecovermodelandapplicationrequirements,withafocusonsafety,privacy,andsecurity.Post-launchmonitoring

andassessmentsenablecontinuousimprovementandriskmanagement.

Weregularlypublishexternalmodelcardsandtechnicalreportstoprovidetransparencyintomodelcreation,function,andintendeduse.Andweinvestintoolingformodelanddatalineagetopromotetransparencyandaccountability.

WetakeascientificapproachtomappingAIrisksthroughresearchandexpertconsultation,codifyingtheseinputsintoarisktaxonomy.

Acorecomponentisriskresearch,encompassingemergingAImodelcapabilities,emergingrisksfromAI,andpotentialAImisuse.Thisresearch,whichwehavepublishedinover300papers,directlyinformsourAIrisktaxonomy,launchevaluations,and

mitigationtechniques.

Ourapproachalsodrawsonexternaldomainexpertise,offeringnewinsightstohelpusbetterunderstandemergingrisksandcomplementing

in-housework.

WehavedevelopedarigorousapproachtomeasuringAImodelandapplicationperformance,focusingonsafety,privacy,andsecuritybenchmarks.Our

approachiscontinuallyevolving,incorporatingnewmeasurementtechniquesastheybecomeavailable.

Multi-layeredredteamingplaysacriticalrole

inourapproach,withbothinternalandexternal

teamsproactivelytestingAIsystemsforweaknesses

andidentifyingemergingrisks.Security-focused

redteamingsimulatesreal-worldattacks,while

content-focusedredteamingidentifiespotential

vulnerabilitiesandissues.ExternalpartnershipsandAI-assistedredteamingfurtherenhancethisprocess.

Modelandapplicationevaluationsarecentralto

thismeasurementapproach.Theseevaluationsassessalignmentwithestablishedframeworksandpolicies,bothbeforeandafterlaunch.

AI-assistedevaluationshelpusscaleourrisk

measurement.AIautoratersstreamlineevaluationandlabelingprocesses.Synthetictestingdata

expeditesscaledmeasurement.Andautomatic

testingforsecurityvulnerabilitieshelpsusassesscoderisksinrealtime.

Wedeployandevolvemitigationstomanagecontentsafety,privacy,andsecurity,suchassafetyfiltersandjailbreakprotections.

Weoftenphaseourlauncheswithaudience-specifictesting,andconductpost-launchmonitoringofuserfeedbackforrapidremediation.

WeworktoadvanceuserunderstandingofAIthroughinnovativedevelopmentsinprovenancetechnology,ourresearch-backedexplainabilityguidelines,andAIliteracyeducation.

Tosupportthebroaderecosystem,weprovideresearchfunding,aswellastoolsdesignedfordevelopersandusers.Wealsopromoteindustrycollaborationonthedevelopmentofstandardsandbestpractices.

4

Summaryofourresponsible

AIoutcomes

todate

BuildingAIresponsibly

requirescollaborationacross

manygroups,including

researchers,industryexperts,governments,andusers.

300+

researchpapersonAI

responsibilityandsafetytopics

Achieved“mature”ratingfor

GoogleCloudAIinathird-partyevaluationofreadinessthrough

Weareactivecontributorstothisecosystem,workingtomaximizeAI’spotentialwhile

safeguardingsafety,privacy,andsecurity.

$120million

theNISTAIRiskManagementFrameworkgovernanceandISO/IEC42001compliance

forAIeducationandtraining

aroundtheworld

PartneredonAIresponsibilitywith

outsidegroupsandinstitutions

liketheFrontierModelForum,

thePartnershiponAI,theWorldEconomicForum,MLCommons,Thorn,theCoalitionforContent

19,000

ProvenanceandAuthenticity,theDigitalTrust&SafetyPartnership,theCoalitionforSecureAI,and

CertifiedGeminiapp,GoogleCloud,andGoogleWorkspacethroughthe

securityprofessionalshavetakentheSAIFRiskSelfAssessmenttoreceiveapersonalizedreportofAI

theAdCouncil

ISO/IEC42001process

risksrelevanttotheirorganization

5

Govern

Govern:

Full-stack

AIgovernance

Policiesandprinciples

Ourgovernanceprocessisgroundedinourprinciplesandframeworks:

AIPrinciples.Weestablishedandevolveour

AIPrinciplestoguideourapproachtodevelopinganddeployingAImodelsandapplications.CoretothesePrinciplesispursuingAIeffortswherethelikelyoverallbenefitssubstantiallyoutweightheforeseeablerisks.

Wetakeafull-stackapproachtoAI

governance—fromresponsiblemodeldevelopmentanddeploymentto

post-launchmonitoringandremediation.

Ourpoliciesandprinciplesguideour

decision-making,withclearrequirementsatthepre-andpost-launchstages,

leadershipreviews,anddocumentation.

Modelsafetyframework.TheFrontierSafety

Framework,whichwerecentlyupdated,helpsusto

proactivelyprepareforpotentialrisksposedbymorepowerfulfutureAImodels.TheFrameworkfollowstheemergingapproachofResponsibleCapabilityScalingproposedbytheU.K.’sAISafetyInstitute.

Contentsafetypolicies.Ourpoliciesformitigating

harminareassuchaschildsafety,suicide,and

self-harmhavebeeninformedbyyearsofresearch,

userfeedback,andexpertconsultation.Thesepoliciesguideourmodelsandproductstominimizecertain

typesofharmfuloutputs.Someindividualapplications,liketheGeminiapp,alsohavetheirownpolicyguidelines.Wealsoprioritizeneutralandinclusivedesign

principles,withagoalofminimizingunfairbias.And

wehaveProhibitedUsePoliciesgoverninghowpeoplecanengagewithourAImodelsandfeatures.

Securityandprivacyframework.OurSecureAIFrameworkfocusesonthesecurityandprivacy

dimensionsofAI.

OurapproachtotheGemini

appguidesourday-to-day

developmentoftheappanditsbehavior.WebelievetheGeminiappshould:

1.Followyourdirections

Gemini’stoppriorityistoserveyouwell.

2.Adapttoyourneeds

GeministrivestobethemosthelpfulAIassistant.

3.Safeguardyourexperience

GeminiaimstoalignwithasetofpolicyguidelinesandisgovernedbyGoogle’sProhibitedUsePolicy.

Application-specificdevelopmentframeworks.InadditiontoGoogle-wideframeworksandpolicies,severalofourapplicationshavespecificframeworkstoguidetheirday-to-daydevelopmentandoperation.

6

Govern

Pre-andpost-launchreviews

Weoperationalizeourprinciples,frameworks,andpoliciesthroughasystemoflaunchrequirements,leadershipreviews,andpost-launchrequirementsdesignedtosupportcontinuousimprovement.

Modelrequirements.Governancerequirementsformodelsfocusonfilteringtrainingdataforquality,modelperformance,andadherencetopolicies,aswellasdocumentingtrainingtechniquesintechnicalreportsandmodelcards.Theseprocessesalso

includesafety,privacy,andsecuritycriteria.

Applicationrequirements.Launchrequirementsforapplicationsaddressrisksandinclude

testinganddesignguidance.Forexample,anapplicationthatgeneratesaudiovisualcontentisrequiredtoincorporatearobustprovenancesolutionlikeSynthID.Theserequirementsare

basedonthenatureoftheproduct,itsintendeduserbase,plannedcapabilities,andthetypes

ofoutputinvolved.Forexample,anapplicationmadeavailabletominorsmayhaveadditionalrequirementsinareaslikeparentalsupervisionandage-appropriatecontent.

Leadershipreviews.ExecutivereviewerswithexpertiseinresponsibleAIcarefullyassess

evaluationresults,mitigations,andrisksbeforemakingalaunchdecision.Theyalsooverseeourframeworks,policies,andprocesses,ensuringthattheseevolvetoaccountfornewmodalitiesandcapabilities.

Post-launchrequirements.Ourgovernance

continuespost-launchwithassessmentsforany

issuesthatmightariseacrossproducts.Post-launchgovernanceidentifiesunmitigatedresidualand

emergingrisks,andopportunitiestoimproveour

models,applications,andourgovernanceprocesses.

Launchinfrastructure.WeareevolvingourinfrastructuretostreamlineAIlaunchmanagement,responsibilitytesting,andmitigationprogressmonitoring.

Documentation

WefostertransparencyandaccountabilitythroughoutourAIgovernanceprocesses.

Modeldocumentation.Externalmodelcards

andtechnicalreportsarepublishedregularlyastransparencyartifacts.TechnicalreportsprovidedetailsabouthowourmostadvancedAImodels

arecreatedandhowtheyfunction.Thisincludes

offeringclarityontheintendedusecases,any

potentiallimitationsofthemodels,andhowour

modelsaredevelopedincollaborationwithsafety,privacy,security,andresponsibilityteams.In

addition,wepublishmodelcardsforourmost

capablemodelsandopenmodels.Thesecards

offersummariesoftechnicalreportsina“nutritionlabel”formattosurfacevitalinformationneededfordownstreamdevelopersortohelppolicy

leadersassessthesafetyofamodel.

Dataandmodellineage.Weareinvestingin

robustinfrastructuretosupportdataandmodellineagetracking,enablingustounderstandtheoriginsandtransformationsofdataandmodelsusedinourAIapplications.

OurresponsibleAIapproachreflectskeyconceptsinindustryguidelinesliketheNISTAIRiskManagementFramework—govern,map,measure,andmanage.

Map

Identifycurrent,emerging,

andpotentialfuture

AIrisks

Measure

Evaluateandmonitor

identifiedrisksandenhance

testingmethods

Govern

Aproactivegovernanceapproach

toresponsibleAIdevelopment

anddeployment

Manage

Establishandimplement

relevantandeffective

mitigations

7

Govern

Casestudy:PromotingAItransparencywithmodelcards

ModelcardswereintroducedinaGoogleresearchpaperin2019as

However,asgenerativeAImodelshave

advanced,wehaveadaptedourmostrecentmodelcards,suchasthecardforourhighest

ModelCard

awaytodocumentandprovidetransparencyabouthowwe

evaluatemodels.

Thatpaperproposedsomebasicmodelcardfieldsthatwouldhelpprovidemodelenduserswiththeinformationtheyneedtoevaluatehowandwhentouseamodel.Manyofthefieldsfirstproposed

qualitytext-to-imagemodelImagen3,toreflecttherapidlyevolvinglandscapeofAIdevelopmentanddeployment.Whilethesemodelcards

stillcontainsomeofthesamecategoriesof

metadataweoriginallyproposedin2019,theyalsoprioritizeclarity,practicalusability,and

includeanassessmentofamodel’sintendedusage,limitations,risksandmitigations,andethicalandsafetyconsiderations.

ModelDetails

Basicinformationaboutthemodel.

?Personororganizationdevelopingmodel

?Modeldate

?Modelversion

?Modeltype

?Informationabouttrainingalgorithms,parameters,fairnessconstraintsorotherappliedapproaches,andfeatures

?Paperorotherresourceformoreinformation

Metrics

Metricsshouldbechosentoreflectpotentialreal-worldimpactsofthemodel.

?Modelperformancemeasures

?Decisionthresholds

?Variationapproaches

EvaluationData

Detailsonthedataset(s)usedforthequantitativeanalysesinthecard.

?Datasets

remainvitalcategoriesofmetadatathatarefoundinmodelcardsacrosstheindustrytoday.

Previousiterationsofourmodelcards,suchasonetopredict3Dfacialsurfacegeometryandoneforanobjectdetectionmodel,conveyedimportantinformationaboutthoserespectivemodels.

Asmodelscontinuetoevolve,wewillwork

torecognizethekeycommonalitiesbetween

modelsinthesemodelcards.Byidentifying

thesecommonalities,whilealsoremaining

flexibleinourapproach,wecanusemodelcardstosupportasharedunderstandingandincreasedtransparencyaroundhowmodelswork.

?Citationdetails

?License

?Wheretosendquestionsorcommentsaboutthemodel

IntendedUse

Usecasesthatwereenvisionedduringdevelopment.

?Primaryintendeduses

?Out-of-scopeusecases

?Motivation

?Preprocessing

TrainingData

Maynotbepossibletoprovideinpractice.Whenpossible,thissectionshouldmirrorEvaluationData.Ifsuchdetail

isnotpossible,minimalallowableinformationshouldbeprovidedhere,suchasdetailsofthedistributionover

variousfactorsinthetrainingdatasets.

Factors

Factorscouldincludedemographic

environmentalconditions,technical

orphenotypicgroups,

attributes,orothers

QuantitativeAnalyses

?Unitaryresults

?Intersectionalresults

listedasrequired.

?RelevantfactorsEthicalConsiderations

?Evaluationfactors

CaveatsandRecommendations

Themodelcardfieldssuggestedinour2019researchpaper

“ModelCardsforModelReporting.”

8

Map

Map:

Riskresearch

Identifyingand

understandingrisks

We’vepublishedmorethan300papersonresponsibleAItopics,andcollaboratedwithresearchinstitutionsaroundtheworld.Recentareasoffocusinclude:

ResearchonnovelAIcapabilities.WeresearchthepotentialimpactofemergingAIcapabilitiessuchasnewmodalitiesandagenticAI,tobetterunderstandifandhowtheymaterialize,aswellasidentifying

WetakeascientificapproachtomappingAIrisksthroughresearchandexpert

consultation,codifyingtheseinputsintoarisktaxonomy.Ourmappingprocessis

fundamentallyiterative,evolvingalongsidethetechnology,andadaptingtotherangeofcontextsinwhichpeopleuseAImodelsorapplications.

potentialmitigationsandpolicies.

ResearchonemergingrisksfromAI.WealsoinvestinresearchonthepotentialemergingrisksfromAIinareaslikebiosecurity,cybersecurity,self-proliferation,dangerouscapabilities,misinformation,andprivacy,toevolveourmitigationsandpolicies.

ResearchonAImisuse.Mappingthepotential

misuseofgenerativeAIhasbecomeacoreareaof

research,andcontributestohowweassessand

evaluateourownmodelsintheseriskareas,aswellaspotentialmitigations.Thisincludesrecentresearchintohowgovernment-backedthreatactorsaretryingtouseAIandwhetheranyofthisactivityrepresentsnovelrisks.

Externaldomainexpertise

Weaugmentourownresearchbyworkingwith

externaldomainexpertsandtrustedtesterswhocanhelpfurtherourmappingandunderstandingofrisks.

Externalexpertfeedback.Wehostworkshops

anddemosatourGoogleSafetyEngineeringCentersaroundtheworldandindustryconferences,

garneringinsightsacrossacademia,civilsociety,andcommercialorganizations.

Trustedtesters.Teamscanalsoleverageexternaltrustedtestinggroupswhoreceivesecureaccesstotestmodelsandapplicationsaccordingtotheir

domainexpertise.

Risktaxonomy

We’vecodifiedourmappingworkintoataxonomyofpotentialrisksassociatedwithAI,buildingontheNISTAIRiskManagementFrameworkandinformedbyourexperiencesdevelopinganddeployingawiderangeofAImodelsandapplications.Theserisksspansafety,privacy,andsecurity,aswellastransparencyandaccountabilityriskssuchasunclearprovenanceorlackofexplainability.Thisriskmapisdesignedtoenableclarityaroundwhichrisksaremostrelevanttounderstandforagivenlaunch,andwhatmightbeneededtomitigatethoserisks.

9

Map

AselectionofourlatestresearchpublicationsfocusedonresponsibleAI

June2024

GenerativeAIMisuse:ATaxonomyofTacticsandInsightsfromReal-WorldData

BeyondThumbsUp/Down:UntanglingChallengesofFine-GrainedFeedbackforText-to-ImageGeneration

July2024

OnScalableOversightwithWeakLLMsJudgingStrongLLMs

JumpingAhead:ImprovingReconstructionFidelitywithJumpReLUSparseAutoencoders

ShieldGemma:GenerativeAIContentModerationBasedonGemma

August2024

GemmaScope:OpenSparseAutoencodersEverywhereAllAtOnceonGemma2

Imagen3

September2024

KnowingWhentoAsk-BridgingLargeLanguageModelsandData

OperationalizingContextualIntegrityinPrivacy-ConsciousAssistants

AToolboxforSurfacingHealthEquityHarmsandBiasesinLargeLanguageModels

October2024

NewContexts,OldHeuristics:HowYoungPeopleinIndiaandtheUSTrustOnlineContentintheAgeofGenerativeAI

AllTooHuman?MappingandMitigatingtheRiskfromAnthropomorphicAI

GapsintheSafetyEvaluationofGenerativeAI

InsightsonDisagreementPatternsinMultimodalSafetyPerceptionacrossDiverseRaterGroups

STAR:SocioTechnicalApproachtoRedTeamingLanguageModels

November2024

ANewGoldenAgeofDiscovery:SeizingtheAIforScienceOpportunity

December2024

MachineUnlearningDoesn’tDoWhatYouThink:

LessonsforGenerativeAIPolicy,Research,andPractice

January2025

AdversarialMisuseofGenerativeAI

HowweEstimatetheRiskfromPromptInjectionAttacksonAISystems

10

Map

Casestudy:MappingandaddressingriskstosafelydeployAlphaFold3

InMay2024,GoogleDeepMind

releasedAlphaFold3,anAImodelcapableofpredictingmolecular

structuresandinteractionsand

howtheyinteract,whichholdsthepromiseoftransformingscientists’understandingofthebiological

worldandacceleratingdrug

discovery.Scientistscanaccessthemajorityofitscapabilities,forfree,throughourAlphaFoldServer,an

easy-to-useresearchtool,orviaopencodeandweights.

Wecarriedoutextensiveresearchthroughout

AlphaFold3’sdevelopmenttounderstandhowitmighthelporposeriskstobiosecurity.OverthecourseofAlphaFold’sdevelopment,weconsultedwithmorethan50externalexpertsacross

variousfields,includingDNAsynthesis,virology,andnationalsecurity,tounderstandtheir

perspectivesonthepotentialbenefitsandrisks.

Anethicsandsafetyassessmentwasconductedwithexternalexperts,inwhichpotentialrisksandbenefitsofAlphaFold3wereidentifiedandanalyzed,includingtheirpotentiallikelihoodandimpact.Thisassessmentwasgroundedinthe

specifictechnicalcapacitiesofthemodelandcomparedthemodeltootherresourcesliketheProteinDataBankandotherAIbiologytools.

TheassessmentwasthenreviewedbyacouncilofseniorinternalexpertsinAIresponsibilityandsafety,whoprovidedfurtherfeedback.

AswithallGoogleDeepMindmodels,AlphaFold3wasdeveloped,trained,stored,andservedwithinGoogle’sinfrastructure,supportedbysecurity

teams,engineers,andresearchers.QuantitativeandqualitativetechniquesareusedtomonitortheadoptionandimpactofAlphaFold3.WepartneredwiththeEuropeanBioinformaticsInstituteoftheEuropeanMolecularBiologyLaboratory(EMBL)tolaunchfreetutorialsonhowtobestuse

AlphaFoldthatmorethan10,000scientistshaveaccessed.Wearecurrentlyexpandingthecourseandpartneringwithlocalcapacitybuildersto

acceleratetheequitableadoptionofAlphaFold3.

TocontinuetoidentifyandmapemergingrisksandbenefitsfromAItobiosecurity,wecontributetocivilsocietyandindustryeffortssuchastheU.K.NationalThreatInitiative’sAI-BioForumand

theFrontierModelForum,aswellasengagingwithgovernmentbodies.

AlphaFoldisacceleratingbreakthroughsinbiologywithAI,andhasrevealedmillionsof

intricate3Dproteinstructures,helpingscientistsunderstandhowlife’smoleculesinteract.

11

Measure:

Assessingrisksandmitigations

basedonbenchmarksforsafety,privacy,andsecurity.Ourapproachevolveswithdevelopmentsintheunderlyingtechnology,newandemergingrisks,andasnew

measurementtechniquesemerge,suchasAI-assistedevaluations.

Afteridentifyingandunderstandingrisksthroughmapping,wesystematically

assessourAIsystemsthroughredteamingexercises.Weevaluatehowwellour

modelsandapplicationsperform,andhoweffectivelyourriskmitigationswork,

Measure

Multi-layeredredteaming

Redteamingexercises,conductedbothinternallyandexternally,proactivelyassessAIsystemsforweaknessesandareasforimprovement.Teams

workingontheseexercisescollaboratetopromoteinformationsharingandindustryalignmentin

redteamingstandards.

Security-focusedredteaming.OurAIRedTeam

combinessecurityandAIexpertisetosimulate

attackerswhomighttargetAIsystems.Basedon

threatintelligencefromteamsliketheGoogleThreatIntelligenceGroup,theAIRedTeamexploresand

identifieshowAIfeaturescancausesecurityissues,recommendsimprovements,andhelpsensurethat

real-worldattackersaredetectedandthwartedbeforetheycausedamage.

Content-focusedredteaming.OurContent

AdversarialRedTeam(CART)proactivelyidentifiesweaknessesinourAIsystems,enablingustomitigaterisksbeforeproductlaunch.CARThasconductedover150redteamingexercisesacrossvariousproducts.OurinternalAItoolsalsoassisthumanexpertred

teamersandincreasethenumberofattacksthey’reabletotestfor.

Externalredteamingpartnerships.OurexternalredteamingincludeslivehackingeventssuchasDEFCONandEscal8,targetedresearchgrants,challenges,andvulnerabilityrewardsprogramstocomplementour

internalevaluations.

AI-assistedredteaming.Toenhanceourapproach,wehavedevelopedformsofAI-assistedredteaming

—trainingAIagentstofindpotentialvulnerabilitiesinotherAIsystems,drawingonworkfromgaming

breakthroughslikeAlphaGo.Forexample,werecentlyshareddetailsofhowweusedAI-assistedredteamingtounderstandhowvulnerableoursystemsmaybetoindirectpromptinjectionattacks,andtoinformhowwemitigatetherisk.

Modelandapplicationevaluations

Acorecomponentofourmeasurementapproachis

runningevaluationsformodelsandapplications.These

evaluationsprimarilyfocusonknownrisks,incontrasttoredteaming,whichfocusesonknownandunknownrisks.

Modelevaluations.Asubsetofthemappedrisksis

relevanttotestatthemodellevel.Forexample,aswe

preparedtolaunchGemini1.5Pro,weevaluatedthe

modelforriskssuchasself-proliferation,offensive

cybersecurity,childsafetyharms,andpersuasion.Wealsodevelopnewevaluationsinkeyareas—suchas

ourworkonFACTSGrounding,whichisabenchmarkforevaluatinghowaccuratelyLLMsgroundtheirresponsesinprovidedsourcematerialandavoidhallucinations.

Applicationevaluations.Theseevaluationsare

designedtoassesstheextenttowhichagiven

applicationfollowstheframeworksandpoliciesthat

applytothatapplication.Thispre-launchtesting

generallycoversawiderangeofrisksspanning

safety,privacy,andsecurity,andthisportfolioof

testingresultshelpsinformlaunchdecisions.Wealsoinvestinsystematicpost-launchtestingthatcantakedifferentforms,suchasrunningregressiontesting

forevaluatinganapplication’songoingalignment

withourframeworksandpolicies,andcross-productevaluationstoidentifywhetherknownrisksforoneapplicationmayhavemanifestedinotherapplications.

AI-assistedevaluations

AsAIcontinuestoscale,it’scriticalthatourabilityto

measurerisksscalesalongwithit.That’swhywe’re

investinginautomatedtestingsolutions,whichcanrunbothbeforelaunchandonanongoingbasisafterrelease.

AIautoraters.Atthemodellayer,Gemini

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論