




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
ResponsibleAIProgressReport
PublishedinFebruary2025
2
Foreword
AIisatransformationaltechnologythatoffersbothauniqueopportunitytomeetourmission,andthechancetoexpand
scientificdiscoveryandtacklesomeoftheworld’smostimportantproblems.AtGooglewebelieveit’scrucialthat
wecontinuetodevelopanddeployAI
aroundtheworldcanbenefitfromits
extraordinarypotentialwhileatthesametimemitigatingagainstitspotentialrisks.
In2018,wewereoneofthefirstintheindustrytoadoptAIPrinciples,and
sincethen,we’vepublishedannual
informationfromourlatestresearchand
practiceonAIsafetyandresponsibilitytopics.Itdetailsourmethodsforgoverning,mapping,measuring,andmanagingAIrisksalignedtotheNISTframework,aswellasupdateson
howwe’reoperationalizingresponsibleAIinnovationacrossGoogle.Wealsoprovide
frameworksemergingfromothercompaniesandacademicinstitutions.Ourupdated
AIPrinciples—centeredonboldinnovation,responsibledevelopment,andcollaborativepartnership—reflectwhatwe’relearningasAIcontinuestoadvancerapidly.
responsibly,withafocusonmakingsurethatpeople,businesses,andgovernments
AIresponsibilityreportsdetailingourprogress.Thisyear’sreportshares
morespecificinsightsandbestpracticesontopicsrangingfromourrigorousredteamingandevaluationprocessestohowwemitigate
AsAItechnologyanddiscussionsaboutitsdevelopmentandusescontinuetoevolve,wewillcontinuetolearnfromourresearch
riskusingtechniques,includingbettersafetytuningandfilters,securityandprivacy
controls,provenancetechnologyinour
products,andbroadAIliteracyeducation.
andusers,andinnovatenewapproachestoresponsibledevelopmentanddeployment.Aswedo,weremaincommittedtosharingwhatwelearnwiththebroaderecosystemthrough
OurapproachtoAIresponsibilityhasevolvedovertheyearstoaddressthedynamicnatureofourproducts,theexternalenvironment,
andtheneedsofourglobalusers.Since
thepublicationofreportslikethis,andalsothroughcontinuousengagement,discussion,andcollaborationwiththewidercommunitytohelpmaximizethebenefitsofAIforeveryone.
2018,AIhasevolvedintoageneral-purpose
LaurieRichardson
technologyuseddailybybillionsofpeopleandcountlessorganizationsandbusinesses.
VicePresident,Trust&Safety,Google
Thebroadestablishmentofresponsibility
frameworkshasbeenanimportantpartof
thisevolution.We’vebeenencouragedby
progressonAIgovernancecomingfrom
bodiesliketheG7andtheInternational
OrganizationforStandardization,andalso
3
SummaryofourresponsibleAIapproach
WehavedevelopedanapproachtoAIgovernancethatfocusesonresponsibilitythroughouttheAIdevelopmentlifecycle.ThisapproachisguidedbyourAIPrinciples,whichemphasizeboldinnovation,responsibledevelopment,andcollaborativeprogress.
OurongoingworkinthisareareflectskeyconceptsinindustryguidelinesliketheNISTAIRiskManagementFramework.
GovernMapMeasureManage
OurAIPrinciplesguideourdecision-makingand
informthedevelopmentofourdifferentframeworksandpolicies,includingtheSecureAIFramework
forsecurityandprivacy,andtheFrontierSafetyFrameworkforevolvingmodelcapabilitiesandmitigations.Additionalpoliciesaddressdesign,safety,andprohibiteduses.
Ourpre-andpost-launchprocessesensure
alignmentwiththesePrinciplesandpolicies
throughclearrequirements,mitigationsupport,andleadershipreviews.Thesecovermodelandapplicationrequirements,withafocusonsafety,privacy,andsecurity.Post-launchmonitoring
andassessmentsenablecontinuousimprovementandriskmanagement.
Weregularlypublishexternalmodelcardsandtechnicalreportstoprovidetransparencyintomodelcreation,function,andintendeduse.Andweinvestintoolingformodelanddatalineagetopromotetransparencyandaccountability.
WetakeascientificapproachtomappingAIrisksthroughresearchandexpertconsultation,codifyingtheseinputsintoarisktaxonomy.
Acorecomponentisriskresearch,encompassingemergingAImodelcapabilities,emergingrisksfromAI,andpotentialAImisuse.Thisresearch,whichwehavepublishedinover300papers,directlyinformsourAIrisktaxonomy,launchevaluations,and
mitigationtechniques.
Ourapproachalsodrawsonexternaldomainexpertise,offeringnewinsightstohelpusbetterunderstandemergingrisksandcomplementing
in-housework.
WehavedevelopedarigorousapproachtomeasuringAImodelandapplicationperformance,focusingonsafety,privacy,andsecuritybenchmarks.Our
approachiscontinuallyevolving,incorporatingnewmeasurementtechniquesastheybecomeavailable.
Multi-layeredredteamingplaysacriticalrole
inourapproach,withbothinternalandexternal
teamsproactivelytestingAIsystemsforweaknesses
andidentifyingemergingrisks.Security-focused
redteamingsimulatesreal-worldattacks,while
content-focusedredteamingidentifiespotential
vulnerabilitiesandissues.ExternalpartnershipsandAI-assistedredteamingfurtherenhancethisprocess.
Modelandapplicationevaluationsarecentralto
thismeasurementapproach.Theseevaluationsassessalignmentwithestablishedframeworksandpolicies,bothbeforeandafterlaunch.
AI-assistedevaluationshelpusscaleourrisk
measurement.AIautoratersstreamlineevaluationandlabelingprocesses.Synthetictestingdata
expeditesscaledmeasurement.Andautomatic
testingforsecurityvulnerabilitieshelpsusassesscoderisksinrealtime.
Wedeployandevolvemitigationstomanagecontentsafety,privacy,andsecurity,suchassafetyfiltersandjailbreakprotections.
Weoftenphaseourlauncheswithaudience-specifictesting,andconductpost-launchmonitoringofuserfeedbackforrapidremediation.
WeworktoadvanceuserunderstandingofAIthroughinnovativedevelopmentsinprovenancetechnology,ourresearch-backedexplainabilityguidelines,andAIliteracyeducation.
Tosupportthebroaderecosystem,weprovideresearchfunding,aswellastoolsdesignedfordevelopersandusers.Wealsopromoteindustrycollaborationonthedevelopmentofstandardsandbestpractices.
4
Summaryofourresponsible
AIoutcomes
todate
BuildingAIresponsibly
requirescollaborationacross
manygroups,including
researchers,industryexperts,governments,andusers.
300+
researchpapersonAI
responsibilityandsafetytopics
Achieved“mature”ratingfor
GoogleCloudAIinathird-partyevaluationofreadinessthrough
Weareactivecontributorstothisecosystem,workingtomaximizeAI’spotentialwhile
safeguardingsafety,privacy,andsecurity.
$120million
theNISTAIRiskManagementFrameworkgovernanceandISO/IEC42001compliance
forAIeducationandtraining
aroundtheworld
PartneredonAIresponsibilitywith
outsidegroupsandinstitutions
liketheFrontierModelForum,
thePartnershiponAI,theWorldEconomicForum,MLCommons,Thorn,theCoalitionforContent
19,000
ProvenanceandAuthenticity,theDigitalTrust&SafetyPartnership,theCoalitionforSecureAI,and
CertifiedGeminiapp,GoogleCloud,andGoogleWorkspacethroughthe
securityprofessionalshavetakentheSAIFRiskSelfAssessmenttoreceiveapersonalizedreportofAI
theAdCouncil
ISO/IEC42001process
risksrelevanttotheirorganization
5
Govern
Govern:
Full-stack
AIgovernance
Policiesandprinciples
Ourgovernanceprocessisgroundedinourprinciplesandframeworks:
AIPrinciples.Weestablishedandevolveour
AIPrinciplestoguideourapproachtodevelopinganddeployingAImodelsandapplications.CoretothesePrinciplesispursuingAIeffortswherethelikelyoverallbenefitssubstantiallyoutweightheforeseeablerisks.
Wetakeafull-stackapproachtoAI
governance—fromresponsiblemodeldevelopmentanddeploymentto
post-launchmonitoringandremediation.
Ourpoliciesandprinciplesguideour
decision-making,withclearrequirementsatthepre-andpost-launchstages,
leadershipreviews,anddocumentation.
Modelsafetyframework.TheFrontierSafety
Framework,whichwerecentlyupdated,helpsusto
proactivelyprepareforpotentialrisksposedbymorepowerfulfutureAImodels.TheFrameworkfollowstheemergingapproachofResponsibleCapabilityScalingproposedbytheU.K.’sAISafetyInstitute.
Contentsafetypolicies.Ourpoliciesformitigating
harminareassuchaschildsafety,suicide,and
self-harmhavebeeninformedbyyearsofresearch,
userfeedback,andexpertconsultation.Thesepoliciesguideourmodelsandproductstominimizecertain
typesofharmfuloutputs.Someindividualapplications,liketheGeminiapp,alsohavetheirownpolicyguidelines.Wealsoprioritizeneutralandinclusivedesign
principles,withagoalofminimizingunfairbias.And
wehaveProhibitedUsePoliciesgoverninghowpeoplecanengagewithourAImodelsandfeatures.
Securityandprivacyframework.OurSecureAIFrameworkfocusesonthesecurityandprivacy
dimensionsofAI.
OurapproachtotheGemini
appguidesourday-to-day
developmentoftheappanditsbehavior.WebelievetheGeminiappshould:
1.Followyourdirections
Gemini’stoppriorityistoserveyouwell.
2.Adapttoyourneeds
GeministrivestobethemosthelpfulAIassistant.
3.Safeguardyourexperience
GeminiaimstoalignwithasetofpolicyguidelinesandisgovernedbyGoogle’sProhibitedUsePolicy.
Application-specificdevelopmentframeworks.InadditiontoGoogle-wideframeworksandpolicies,severalofourapplicationshavespecificframeworkstoguidetheirday-to-daydevelopmentandoperation.
6
Govern
Pre-andpost-launchreviews
Weoperationalizeourprinciples,frameworks,andpoliciesthroughasystemoflaunchrequirements,leadershipreviews,andpost-launchrequirementsdesignedtosupportcontinuousimprovement.
Modelrequirements.Governancerequirementsformodelsfocusonfilteringtrainingdataforquality,modelperformance,andadherencetopolicies,aswellasdocumentingtrainingtechniquesintechnicalreportsandmodelcards.Theseprocessesalso
includesafety,privacy,andsecuritycriteria.
Applicationrequirements.Launchrequirementsforapplicationsaddressrisksandinclude
testinganddesignguidance.Forexample,anapplicationthatgeneratesaudiovisualcontentisrequiredtoincorporatearobustprovenancesolutionlikeSynthID.Theserequirementsare
basedonthenatureoftheproduct,itsintendeduserbase,plannedcapabilities,andthetypes
ofoutputinvolved.Forexample,anapplicationmadeavailabletominorsmayhaveadditionalrequirementsinareaslikeparentalsupervisionandage-appropriatecontent.
Leadershipreviews.ExecutivereviewerswithexpertiseinresponsibleAIcarefullyassess
evaluationresults,mitigations,andrisksbeforemakingalaunchdecision.Theyalsooverseeourframeworks,policies,andprocesses,ensuringthattheseevolvetoaccountfornewmodalitiesandcapabilities.
Post-launchrequirements.Ourgovernance
continuespost-launchwithassessmentsforany
issuesthatmightariseacrossproducts.Post-launchgovernanceidentifiesunmitigatedresidualand
emergingrisks,andopportunitiestoimproveour
models,applications,andourgovernanceprocesses.
Launchinfrastructure.WeareevolvingourinfrastructuretostreamlineAIlaunchmanagement,responsibilitytesting,andmitigationprogressmonitoring.
Documentation
WefostertransparencyandaccountabilitythroughoutourAIgovernanceprocesses.
Modeldocumentation.Externalmodelcards
andtechnicalreportsarepublishedregularlyastransparencyartifacts.TechnicalreportsprovidedetailsabouthowourmostadvancedAImodels
arecreatedandhowtheyfunction.Thisincludes
offeringclarityontheintendedusecases,any
potentiallimitationsofthemodels,andhowour
modelsaredevelopedincollaborationwithsafety,privacy,security,andresponsibilityteams.In
addition,wepublishmodelcardsforourmost
capablemodelsandopenmodels.Thesecards
offersummariesoftechnicalreportsina“nutritionlabel”formattosurfacevitalinformationneededfordownstreamdevelopersortohelppolicy
leadersassessthesafetyofamodel.
Dataandmodellineage.Weareinvestingin
robustinfrastructuretosupportdataandmodellineagetracking,enablingustounderstandtheoriginsandtransformationsofdataandmodelsusedinourAIapplications.
OurresponsibleAIapproachreflectskeyconceptsinindustryguidelinesliketheNISTAIRiskManagementFramework—govern,map,measure,andmanage.
Map
Identifycurrent,emerging,
andpotentialfuture
AIrisks
Measure
Evaluateandmonitor
identifiedrisksandenhance
testingmethods
Govern
Aproactivegovernanceapproach
toresponsibleAIdevelopment
anddeployment
Manage
Establishandimplement
relevantandeffective
mitigations
7
Govern
Casestudy:PromotingAItransparencywithmodelcards
ModelcardswereintroducedinaGoogleresearchpaperin2019as
However,asgenerativeAImodelshave
advanced,wehaveadaptedourmostrecentmodelcards,suchasthecardforourhighest
ModelCard
awaytodocumentandprovidetransparencyabouthowwe
evaluatemodels.
Thatpaperproposedsomebasicmodelcardfieldsthatwouldhelpprovidemodelenduserswiththeinformationtheyneedtoevaluatehowandwhentouseamodel.Manyofthefieldsfirstproposed
qualitytext-to-imagemodelImagen3,toreflecttherapidlyevolvinglandscapeofAIdevelopmentanddeployment.Whilethesemodelcards
stillcontainsomeofthesamecategoriesof
metadataweoriginallyproposedin2019,theyalsoprioritizeclarity,practicalusability,and
includeanassessmentofamodel’sintendedusage,limitations,risksandmitigations,andethicalandsafetyconsiderations.
ModelDetails
Basicinformationaboutthemodel.
?Personororganizationdevelopingmodel
?Modeldate
?Modelversion
?Modeltype
?Informationabouttrainingalgorithms,parameters,fairnessconstraintsorotherappliedapproaches,andfeatures
?Paperorotherresourceformoreinformation
Metrics
Metricsshouldbechosentoreflectpotentialreal-worldimpactsofthemodel.
?Modelperformancemeasures
?Decisionthresholds
?Variationapproaches
EvaluationData
Detailsonthedataset(s)usedforthequantitativeanalysesinthecard.
?Datasets
remainvitalcategoriesofmetadatathatarefoundinmodelcardsacrosstheindustrytoday.
Previousiterationsofourmodelcards,suchasonetopredict3Dfacialsurfacegeometryandoneforanobjectdetectionmodel,conveyedimportantinformationaboutthoserespectivemodels.
Asmodelscontinuetoevolve,wewillwork
torecognizethekeycommonalitiesbetween
modelsinthesemodelcards.Byidentifying
thesecommonalities,whilealsoremaining
flexibleinourapproach,wecanusemodelcardstosupportasharedunderstandingandincreasedtransparencyaroundhowmodelswork.
?Citationdetails
?License
?Wheretosendquestionsorcommentsaboutthemodel
IntendedUse
Usecasesthatwereenvisionedduringdevelopment.
?Primaryintendeduses
?Out-of-scopeusecases
?Motivation
?Preprocessing
TrainingData
Maynotbepossibletoprovideinpractice.Whenpossible,thissectionshouldmirrorEvaluationData.Ifsuchdetail
isnotpossible,minimalallowableinformationshouldbeprovidedhere,suchasdetailsofthedistributionover
variousfactorsinthetrainingdatasets.
Factors
Factorscouldincludedemographic
environmentalconditions,technical
orphenotypicgroups,
attributes,orothers
QuantitativeAnalyses
?Unitaryresults
?Intersectionalresults
listedasrequired.
?RelevantfactorsEthicalConsiderations
?Evaluationfactors
CaveatsandRecommendations
Themodelcardfieldssuggestedinour2019researchpaper
“ModelCardsforModelReporting.”
8
Map
Map:
Riskresearch
Identifyingand
understandingrisks
We’vepublishedmorethan300papersonresponsibleAItopics,andcollaboratedwithresearchinstitutionsaroundtheworld.Recentareasoffocusinclude:
ResearchonnovelAIcapabilities.WeresearchthepotentialimpactofemergingAIcapabilitiessuchasnewmodalitiesandagenticAI,tobetterunderstandifandhowtheymaterialize,aswellasidentifying
WetakeascientificapproachtomappingAIrisksthroughresearchandexpert
consultation,codifyingtheseinputsintoarisktaxonomy.Ourmappingprocessis
fundamentallyiterative,evolvingalongsidethetechnology,andadaptingtotherangeofcontextsinwhichpeopleuseAImodelsorapplications.
potentialmitigationsandpolicies.
ResearchonemergingrisksfromAI.WealsoinvestinresearchonthepotentialemergingrisksfromAIinareaslikebiosecurity,cybersecurity,self-proliferation,dangerouscapabilities,misinformation,andprivacy,toevolveourmitigationsandpolicies.
ResearchonAImisuse.Mappingthepotential
misuseofgenerativeAIhasbecomeacoreareaof
research,andcontributestohowweassessand
evaluateourownmodelsintheseriskareas,aswellaspotentialmitigations.Thisincludesrecentresearchintohowgovernment-backedthreatactorsaretryingtouseAIandwhetheranyofthisactivityrepresentsnovelrisks.
Externaldomainexpertise
Weaugmentourownresearchbyworkingwith
externaldomainexpertsandtrustedtesterswhocanhelpfurtherourmappingandunderstandingofrisks.
Externalexpertfeedback.Wehostworkshops
anddemosatourGoogleSafetyEngineeringCentersaroundtheworldandindustryconferences,
garneringinsightsacrossacademia,civilsociety,andcommercialorganizations.
Trustedtesters.Teamscanalsoleverageexternaltrustedtestinggroupswhoreceivesecureaccesstotestmodelsandapplicationsaccordingtotheir
domainexpertise.
Risktaxonomy
We’vecodifiedourmappingworkintoataxonomyofpotentialrisksassociatedwithAI,buildingontheNISTAIRiskManagementFrameworkandinformedbyourexperiencesdevelopinganddeployingawiderangeofAImodelsandapplications.Theserisksspansafety,privacy,andsecurity,aswellastransparencyandaccountabilityriskssuchasunclearprovenanceorlackofexplainability.Thisriskmapisdesignedtoenableclarityaroundwhichrisksaremostrelevanttounderstandforagivenlaunch,andwhatmightbeneededtomitigatethoserisks.
9
Map
AselectionofourlatestresearchpublicationsfocusedonresponsibleAI
June2024
GenerativeAIMisuse:ATaxonomyofTacticsandInsightsfromReal-WorldData
BeyondThumbsUp/Down:UntanglingChallengesofFine-GrainedFeedbackforText-to-ImageGeneration
July2024
OnScalableOversightwithWeakLLMsJudgingStrongLLMs
JumpingAhead:ImprovingReconstructionFidelitywithJumpReLUSparseAutoencoders
ShieldGemma:GenerativeAIContentModerationBasedonGemma
August2024
GemmaScope:OpenSparseAutoencodersEverywhereAllAtOnceonGemma2
Imagen3
September2024
KnowingWhentoAsk-BridgingLargeLanguageModelsandData
OperationalizingContextualIntegrityinPrivacy-ConsciousAssistants
AToolboxforSurfacingHealthEquityHarmsandBiasesinLargeLanguageModels
October2024
NewContexts,OldHeuristics:HowYoungPeopleinIndiaandtheUSTrustOnlineContentintheAgeofGenerativeAI
AllTooHuman?MappingandMitigatingtheRiskfromAnthropomorphicAI
GapsintheSafetyEvaluationofGenerativeAI
InsightsonDisagreementPatternsinMultimodalSafetyPerceptionacrossDiverseRaterGroups
STAR:SocioTechnicalApproachtoRedTeamingLanguageModels
November2024
ANewGoldenAgeofDiscovery:SeizingtheAIforScienceOpportunity
December2024
MachineUnlearningDoesn’tDoWhatYouThink:
LessonsforGenerativeAIPolicy,Research,andPractice
January2025
AdversarialMisuseofGenerativeAI
HowweEstimatetheRiskfromPromptInjectionAttacksonAISystems
10
Map
Casestudy:MappingandaddressingriskstosafelydeployAlphaFold3
InMay2024,GoogleDeepMind
releasedAlphaFold3,anAImodelcapableofpredictingmolecular
structuresandinteractionsand
howtheyinteract,whichholdsthepromiseoftransformingscientists’understandingofthebiological
worldandacceleratingdrug
discovery.Scientistscanaccessthemajorityofitscapabilities,forfree,throughourAlphaFoldServer,an
easy-to-useresearchtool,orviaopencodeandweights.
Wecarriedoutextensiveresearchthroughout
AlphaFold3’sdevelopmenttounderstandhowitmighthelporposeriskstobiosecurity.OverthecourseofAlphaFold’sdevelopment,weconsultedwithmorethan50externalexpertsacross
variousfields,includingDNAsynthesis,virology,andnationalsecurity,tounderstandtheir
perspectivesonthepotentialbenefitsandrisks.
Anethicsandsafetyassessmentwasconductedwithexternalexperts,inwhichpotentialrisksandbenefitsofAlphaFold3wereidentifiedandanalyzed,includingtheirpotentiallikelihoodandimpact.Thisassessmentwasgroundedinthe
specifictechnicalcapacitiesofthemodelandcomparedthemodeltootherresourcesliketheProteinDataBankandotherAIbiologytools.
TheassessmentwasthenreviewedbyacouncilofseniorinternalexpertsinAIresponsibilityandsafety,whoprovidedfurtherfeedback.
AswithallGoogleDeepMindmodels,AlphaFold3wasdeveloped,trained,stored,andservedwithinGoogle’sinfrastructure,supportedbysecurity
teams,engineers,andresearchers.QuantitativeandqualitativetechniquesareusedtomonitortheadoptionandimpactofAlphaFold3.WepartneredwiththeEuropeanBioinformaticsInstituteoftheEuropeanMolecularBiologyLaboratory(EMBL)tolaunchfreetutorialsonhowtobestuse
AlphaFoldthatmorethan10,000scientistshaveaccessed.Wearecurrentlyexpandingthecourseandpartneringwithlocalcapacitybuildersto
acceleratetheequitableadoptionofAlphaFold3.
TocontinuetoidentifyandmapemergingrisksandbenefitsfromAItobiosecurity,wecontributetocivilsocietyandindustryeffortssuchastheU.K.NationalThreatInitiative’sAI-BioForumand
theFrontierModelForum,aswellasengagingwithgovernmentbodies.
AlphaFoldisacceleratingbreakthroughsinbiologywithAI,andhasrevealedmillionsof
intricate3Dproteinstructures,helpingscientistsunderstandhowlife’smoleculesinteract.
11
Measure:
Assessingrisksandmitigations
basedonbenchmarksforsafety,privacy,andsecurity.Ourapproachevolveswithdevelopmentsintheunderlyingtechnology,newandemergingrisks,andasnew
measurementtechniquesemerge,suchasAI-assistedevaluations.
Afteridentifyingandunderstandingrisksthroughmapping,wesystematically
assessourAIsystemsthroughredteamingexercises.Weevaluatehowwellour
modelsandapplicationsperform,andhoweffectivelyourriskmitigationswork,
Measure
Multi-layeredredteaming
Redteamingexercises,conductedbothinternallyandexternally,proactivelyassessAIsystemsforweaknessesandareasforimprovement.Teams
workingontheseexercisescollaboratetopromoteinformationsharingandindustryalignmentin
redteamingstandards.
Security-focusedredteaming.OurAIRedTeam
combinessecurityandAIexpertisetosimulate
attackerswhomighttargetAIsystems.Basedon
threatintelligencefromteamsliketheGoogleThreatIntelligenceGroup,theAIRedTeamexploresand
identifieshowAIfeaturescancausesecurityissues,recommendsimprovements,andhelpsensurethat
real-worldattackersaredetectedandthwartedbeforetheycausedamage.
Content-focusedredteaming.OurContent
AdversarialRedTeam(CART)proactivelyidentifiesweaknessesinourAIsystems,enablingustomitigaterisksbeforeproductlaunch.CARThasconductedover150redteamingexercisesacrossvariousproducts.OurinternalAItoolsalsoassisthumanexpertred
teamersandincreasethenumberofattacksthey’reabletotestfor.
Externalredteamingpartnerships.OurexternalredteamingincludeslivehackingeventssuchasDEFCONandEscal8,targetedresearchgrants,challenges,andvulnerabilityrewardsprogramstocomplementour
internalevaluations.
AI-assistedredteaming.Toenhanceourapproach,wehavedevelopedformsofAI-assistedredteaming
—trainingAIagentstofindpotentialvulnerabilitiesinotherAIsystems,drawingonworkfromgaming
breakthroughslikeAlphaGo.Forexample,werecentlyshareddetailsofhowweusedAI-assistedredteamingtounderstandhowvulnerableoursystemsmaybetoindirectpromptinjectionattacks,andtoinformhowwemitigatetherisk.
Modelandapplicationevaluations
Acorecomponentofourmeasurementapproachis
runningevaluationsformodelsandapplications.These
evaluationsprimarilyfocusonknownrisks,incontrasttoredteaming,whichfocusesonknownandunknownrisks.
Modelevaluations.Asubsetofthemappedrisksis
relevanttotestatthemodellevel.Forexample,aswe
preparedtolaunchGemini1.5Pro,weevaluatedthe
modelforriskssuchasself-proliferation,offensive
cybersecurity,childsafetyharms,andpersuasion.Wealsodevelopnewevaluationsinkeyareas—suchas
ourworkonFACTSGrounding,whichisabenchmarkforevaluatinghowaccuratelyLLMsgroundtheirresponsesinprovidedsourcematerialandavoidhallucinations.
Applicationevaluations.Theseevaluationsare
designedtoassesstheextenttowhichagiven
applicationfollowstheframeworksandpoliciesthat
applytothatapplication.Thispre-launchtesting
generallycoversawiderangeofrisksspanning
safety,privacy,andsecurity,andthisportfolioof
testingresultshelpsinformlaunchdecisions.Wealsoinvestinsystematicpost-launchtestingthatcantakedifferentforms,suchasrunningregressiontesting
forevaluatinganapplication’songoingalignment
withourframeworksandpolicies,andcross-productevaluationstoidentifywhetherknownrisksforoneapplicationmayhavemanifestedinotherapplications.
AI-assistedevaluations
AsAIcontinuestoscale,it’scriticalthatourabilityto
measurerisksscalesalongwithit.That’swhywe’re
investinginautomatedtestingsolutions,whichcanrunbothbeforelaunchandonanongoingbasisafterrelease.
AIautoraters.Atthemodellayer,Gemini
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 學(xué)生誠信表現(xiàn)承諾3篇
- 公司注銷委托書模板3篇
- 使命擔(dān)當(dāng)防疫行動指南3篇
- 悔過自新的保證書3篇
- 工程施工合同范本開票2篇
- 環(huán)保咨詢在城市化進程中的作用考核試卷
- 紙張力學(xué)性能測試與優(yōu)化考核試卷
- 鹽的質(zhì)量控制與食品安全監(jiān)測管理體系考核試卷
- 2025全國標(biāo)準(zhǔn)代理合同模板
- 2025教育機構(gòu)教師試用期合同范本
- (二模)2025年深圳市高三年級第二次調(diào)研考試歷史試卷(含標(biāo)準(zhǔn)答案)
- 廣西《疼痛綜合評估規(guī)范》(材料)
- 2025年山東省淄博市張店區(qū)中考一模歷史試題(含答案)
- 美容師考試與法律法規(guī)相關(guān)知識及試題答案
- 推動研究生教育高質(zhì)量發(fā)展方案
- 2025-2030中國藥用活性炭行業(yè)市場現(xiàn)狀供需分析及投資評估規(guī)劃分析研究報告
- 2025-2031年中國竹鼠養(yǎng)殖及深加工行業(yè)投資研究分析及發(fā)展前景預(yù)測報告
- 超星爾雅學(xué)習(xí)通《國際經(jīng)濟學(xué)(中國人民大學(xué))》2025章節(jié)測試附答案
- 陜西省2024年高中學(xué)業(yè)水平合格考化學(xué)試卷試題(含答案解析)
- 輸液泵/微量注射泵使用技術(shù)操作考核評分標(biāo)準(zhǔn)
- SJG 74-2020 深圳市安裝工程消耗量定額-高清現(xiàn)行
評論
0/150
提交評論