監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision

上傳人：策*** IP屬地：山西上傳時間：2025-06-19 格式：DOCX 頁數(shù)：61 大?。?.81MB 積分：19.9 舉報 版權申訴

監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision_第2頁

監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision_第3頁

監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision_第4頁

監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision_第5頁

已閱讀5頁，還剩56頁未讀，繼續(xù)免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權，請進行舉報或認領

文檔簡介

ofAgentic

Supervision

TheFuture

ΛRFΛCT

AIISABOUTPEOPLE

WEACCELERATEDATAANDAIADOPTIONTOPOSITIVELYIMPACT

PEOPLEANDORGANIZATIONS.

COUNTRIES

1700

EMPLOYEES

+1000

CLIENTS

Artefactisagloballeaderinconsultingservices,specializedindatatransformation

anddata&digitalmarketing,fromstrategytothedeploymentofAIsolutions.

Weareofferingauniquecombinationofinnovation(Art)anddatascience(Fact).

STRATEGY&TRANSFORMATION|AIACCELERATION|DATAFOUNDATIONS&BI

IT&DATAPLATFORMS|MARKETINGDATA&DIGITAL

Executivesummary

LastFebruary,wepublished“TheFutureofWorkwithAI”,ourfirststudyonAgenticAI.WefoundthatalthoughAIagentswillreplacehumansontediousandrepetitivetasks,anewtypeofworkwillappear:AgenticSupervision.Duringtheindustrialrevolution,machinesreplacedhumansonmanualtasks,butnewjobsappearedsuchasmachinepurchasing,operationalsupervisionandmaintenance.WithAgenticAI,cognitivejobswillbereplacedbyotherhigher-levelandmoreproductivecognitivejobs.ThisstudyintendstodeepdiveintotheearlydaysofAgenticSupervisionandtodrawtheoutlineoftheFutureofSupervisionintermsofAgentlifecyclemanagement,governanceandsupervisiontooling.

TogatherthecurrentstateofAgenticSupervision,wein-terviewed14enterprisesand5ArtefactAgenticProductManagers&Engineers.WealsocontactedkeyAgenticSupervisionproviders,includingmajorData&AIplatformswithyearsofsoftwaresupervisionexperience(suchasGoogleandMicrosoft)aswellasspecializedstart-ups(WB,Giskard,RobustIntelligence…).

ThefirstinsightwefoundisthatwhileAgenticSupervisionextendstheprinciplesestablishedinDevOps(softwareop-erations),DataOps(dataoperations),andMLOps(MachineLearningoperations),itdramaticallyincreasesthedemandforrobustgovernancetokeepAIAgentsalignedandundercontrol.Indeed,with“softwarethatstartstothink”,unseenrisksareemerging,suchashallucination,reasoningerrors,inappropriatetone,intellectualpropertyinfringementorevenpromptjacking.Mitigatingthesereliability,behavioral,regulatoryandsecurityrisksnowrequiresgovernancethatisnotonlymorerigorousbutalsobroaderthanwhathaspreviouslybeenappliedtotechproducts.

Thismarkedlygreaterneedforgovernanceisthechal-lengethatmaydefinetheemergingoperationalparadigmof“AgentOps”.Interestingly,AgentOpswillneedtobuilduponeachorganization’sexistingDevOps,DataOps,andMLOpsfoundationsandgovernance,andcompanieslag-

RFΛCT

THEFUTUREOFAGENTICSUPERVISION

“WefoundthatalthoughAIagentswillreplacehumansontediousandrepetitivetasks,anewtypeofworkwillappear:

AgenticSupervision.”

gingintheseoperationaldomainswillhavetobridgeanygapsintheseareaswhilesettingtheirAgenticgovernanceframework.

Thesecondmajorchallengeidentifiedbyourinterview-eesistheneedtostrengthentheirAIsupervisiontooling.ManyarecurrentlyrelyingonexistingRPAandDev/Data/MLOpstools,orexperimentingwithcustom-builtsolutionsastheysearchformoresustainable,long-termoptions.Theabundanceofearly-stagetoolsandtheneedtoenvisionacohesive,end-to-endsupervisionsystemthatintegratesmultiplecomponents,promptedustoexplorethetechno-logicaldimensionsofagenticsupervisioningreaterdepth.AswithanyTechOpsframework,AgentOpssupervisioninvolvesthreefundamentalstages:(1)Observe,(2)Evaluate,and(3)Monitorandmanageincidents.Whilethethirdstagerepresentsthelargestsupervisioneffortandtime,thefirsttwoareessentialtoensuringeffectiveriskmanagement.Withnewcategoriesofriskstomonitorandconsequently,newlogs,traces,andevaluationmechanismstoestablish,it’sclearwhyintervieweesconsistentlyemphasizedtheneedfortherighttoolstosupportscalableandreliablesupervision.

EXECUTIVESUMMARYTHEFUTUREOFAGENTICSUPERVISION

“Supervisionshouldnotbeanafterthought,itmustbe

embeddedearlyintheagent’sdesignanddevelopment.”

Ourresearchintoagenticsupervisiontoolsrevealedthreekeyinsights.First,thereiscurrentlynoall-in-onesolutionavailable.MajorcloudproviderslikeGoogleandMicrosoftareactivelydevelopingandreleasingsupervisiontoolsandframeworksaimedatcoveringthefullspectrumofsupervisionneedsforteamsbuildingagentsonplatformssuchasVertexAI(Google)andCopilotStudio(Microsoft).Second,agentsupervisionfallsintotwocategories:pro-activeandreactive.Proactivesupervisionisappliedduringdevelopmenttotestagentsagainstdefinedscenariosor,inproduction,tocontinuouslyguardagainstemergingthreats,particularlyintheareaofsecurity,ortocollectaggregatedperformancedata.Itsgoalistoimproveagentbehaviorovertime.Reactivesupervision,ontheotherhand,focusesondetectingandhandlingliveincidents.Althoughbothtypesrelyonobservabilitytoolsandmayusesimilarevaluationmechanisms,theydiffersignificantlyindatasources,eval-uationgranularity,andresponsestrategies.Finally,ourthirdinsightisthatagenticobservability,evaluation,andriskmitigationremaincomplexandrapidlyevolvingdomains.Weanticipatesubstantialadvancementsinsupervisiontoolingoverthecomingyears.

Eachphaseoftheagenticsupervisioncycle;observe,evaluate,andsupervise,presentsitsownsetofchal-lenges.

Observabilityfirstrequiresanticipatingwhatdatatocapture,whichdependsheavilyonhavingaclearlydefinedevaluationandsupervisionstrategy.Withoutthisforesight,teamsriskeithercollectingtoolittleinformationorbeingoverwhelmedbyvast,unstructuredtracesthathindermanualrootcause

analysis.ToolslikeLangSmithandLangChainareincreas-inglyusedtostructureandstreamlinetheobservationofagentbehavior.AnothermajorchallengeliesintheopacityofLLMreasoning,whichmustbecounteredbydeliberatelydesigningagentarchitecturesandworkflowstoensuretraceabilityandtransparency.

EvaluationinagenticAIissignificantlymorecomplexthanintraditionalsoftwareordataqualityassessments.Wheredeterministictestsbasedonobservabilityqueriesaresuf-ficientinclassicalDevOpsandDataOps,agenticsystemsoftenrequireAItoevaluateAI.ThishasledtotheriseofLLM-as-a-judgetechniques;acounterintuitiveapproachwhereonemodelassessestheoutputofanother.Whilethisraisesconcerns(whytrustflawedAItojudgeflawedAI?),studiesshowitoftenproducesmoreconsistentandscalableresultsthanhumanreviewers.Nonetheless,acommonpainpointamongintervieweeswasthedifficultyofbuildingreliablegroundtruthdatasets,expert-curatedquestion-answerpairs,tobenchmarkagentresponses.Humanevaluatorstendtodisagreeandoftenlackcom-pletenessintheiranswers.

Finally,supervisionandmitigationfacechallengesaroundprioritization.Withagrowingnumberofmetricsandalerts,teamscanquicklybecomeoverwhelmed.Standardizedframeworksforalertingandmetricmanagementareamusttobringstructureandclaritytoagenticsupervision.

Onlyahandfuloforganizationshavesuccessfullyestab-lishedeffectivegovernanceandstandardsforagenticAI.Thosewithmaturesoftwareanddatagovernanceframe-

4ΛRFCT

EXECUTIVESUMMARY

“AgenticSupervisionis

theFutureofWorkwithAI!”

workshavehadaheadstart,benefitingfromstrongfoun-dationsandawell-establishedcultureofobservabilityandsupervision.Weobservedthatleveragingexistingsoftware,RPA,anddatasupervisionpractices,processes,andtoolscansignificantlyaccelerateprogress.However,thekeychal-lengeliesinadaptingthesetothedynamicrisksandevolvingtoolsetsspecifictoagenticAI,andinbuildingadedicated,future-readygovernanceframework.Relyingtoolongonlegacyapproaches,includingdeterministiclogicandcus-tom-builttools,canbecomeaconstraint,limitingteamstonarrow,tightlycontrolledagenticworkflowsandpreventingtheadoptionofmoreautonomous,AI-orchestratedagents.

Allintervieweesemphasizedthatthekeytoeffectiveagenticsupervisionisanticipation.Supervisionshouldnotbeanaf-terthought,itmustbeembeddedearlyintheagent’sdesignanddevelopment.Settingupobservabilityandevaluationmechanismsonlyoncetheagentisinproductionistoolate.Identifyingflawsatthatstageoftenmeansreworkingtheentireagent,whichisfarmorecostlythaninvestinginrobustsupervisionfromthestart.

Thegoodnewsisthatavarietyoftestedtoolcombinationsandemergingagenticframeworksarealreadyavailable.WestronglyrecommendthatenterpriseAIgovernanceteamsdefinetheirownstandardizedframeworkandtoolsettobeappliedacrossallagenticdevelopment.Thisbecomesevenmorecriticalasagentsbegintointerconnect,makingsys-tem-widecontrolandsupervisioninteroperabilityessential.

Tosucceed,AIgovernancemustalsoaligncloselywithstrongITandDataGovernancepractices,sinceagents

RFΛCT

THEFUTUREOFAGENTICSUPERVISION

relyonenterprisedataandITsystemsto‘think’andtake‘a(chǎn)ction.’JustasITanddatagovernancerequiredbusinessinvolvementinthepast,oneofthekeytakeawaysfromourstudyisthatagenticgovernancewilldemandevendeeperbusinessengagement.

Unliketraditionalsoftwareordatasupervision,typicallyhandledbyITordatateams(andinthemostmatureor-ganizations,byabusiness-leddatagovernancenetwork),agentsupervisionwillneedtobebusiness-owned.GiventheinherentunpredictabilityofAIagents,incidentresponsesof-tenrequiredomainexpertise.Asaresult,thebusinessmustbeactivelyinvolvednotjustinmonitoring,butinframingagentbehaviorfromtheoutset.Thisrepresentsasignificantculturalshift:agenticAIblursthelinesbetweenIT,data,andbusiness,andwillrequirenewwaysofworkingbasedoncross-functionalcollaboration.AgenticSupervisionistheFutureofWorkwithAI!

FlorenceBénézit

ExpertPartnerData&AIGovernance

HananOuazan

ManagingPartner,LeadGenerativeAI

THANKS&ACKNOWLEDGMENTSTHEFUTUREOFAGENTICSUPERVISION

Methodology

ThisstudyisbasedonaqualitativeresearchapproachdesignedtoexploretheemergingchallengesandgovernancepracticessurroundingtheearlyimplementationsofautonomousAIagentsinorganizations.Bycombiningexpertinterviewswithanin-depthanalysisoftheevolvingtechnologicallandscape,weaimedtomapcurrentpractices,identifyoperationalneeds,andunderstandthevaluepropositionsofavailablesolutionsforagentobservability,evaluation,andsupervision.

Weconducted20+interviewswithprofessionalsdirectlyinvolvedinthedeployment,governance,ortechnicaldevelopmentofagenticsystems.Theseincluded:

—AIandDataLeaders,suchasChiefDataOfficers,HeadsofAI,andDataPlatformDirectors,whosharedtheirstrategicvisiononagentimplementation,riskmanagement,andtheevolutionofdatainfrastructure.

—ProductManagersandInnovationExecutiveswhoofferedinsightsintooperationalusecases,organizationalreadiness,andtheshifttowardagent-centricarchitectures.

—Compliance,Security,andITGovernanceExperts,

whoprovidedcriticalinputonregulatoryexpectations,ethicalrisks,andtheemergingneedforreal-timecontrolmechanismstailoredtoAIagents.

—FoundersandChiefsofScienceofAItoolingcompanies,

whosefeedbackhelpedassessthestateofthemarketacrossthreekeyfunctions:observability,evaluation,andactivesupervisionofAIagents.

Intervieweesrepresentedadiverserangeoforganizations,includingmajorcorporations(insectorssuchasenergy,telecom,pharmaceuticals,andluxury),globaltechplayers,andhigh-growthstartups,ensuringarichandnuancedunderstandingofthetopic.

Inparallel,weconductedasystematicreviewofoveradozentoolsandplatformsofferingcapabilitiesrelevanttoagentgovernanceincludingLangfuse,LangSmith,DeepEval,CopilotStudio,VertexAI,Ragas,Weights&Biases,PRISMEval,DeepEval,RobustIntelligence,Giskard…Eachsolutionwasanalyzedusingadedicatedframeworkthatcross-referencedthreedimensionsofquality(Reliability,BehavioralAlignment,Security)withthreestagesofsupervision(Observation,Evaluation,ActiveSupervision).

Byintegratingreal-worldpractitionerfeedbackwithastructuredtechnologicalbenchmark,thisstudyaimstoofferapragmaticandforward-lookingperspectiveonhowcompaniescanresponsiblyscaleagenticAIsystems.

SpecialThanks&Acknowledgments

ENTERPRISEINTERVIEWEES

YoannBersihand,VPAITechnology,SCHNEIDER

ArthurGarnier,ITChiefofStaff&DataScientist,ARDIANJean-Fran?oisGuilmard,CDO,ACCOR

PaulSaffers,DeputyCDO,VEOLIA

AlexisVaillant,HeadAutomatisation,ORANGE

LeoWang,DataProtectionOfficer,LOUISVUITTONCHINA

AGENTOPSSTACKINTERVIEWEES

AlexCombessie,Co-founder&Co-CEO,GISKARD

SaloméFroment,AccountDirectorFrance,WEIGHTS&BIASESéricHoresnyi,HeadofAIGo-To-Market,GOOGLEFRANCE

AminKarbasi,SeniorDirector,CISCOFOUNDATIONAIRESEARCH(FormerChiefScientistatRobustIntelligence)

Jean-LucLaurent,GenerativeAI/MLSpecialist,GOOGLE

PierrePeigné,Co-founderandChiefScienceOfficer,PRISMEvalChrisVanPelt,Co-founder&CISO,WEIGHTS&BIASES

MarcGardette,DeputyCTO,MICROSOFTFRANCE

6ΛRFCT

TABLEOFCONTENTSTHEFUTUREOFAGENTICSUPERVISION

Introduction

9I—AgenticAIrisksareshakingupthetech

governance&supervisiongame.

10AgenticAIorwhensoftwarestartstothink.

14Newtech,oldproblems:whygovernanceisacontinuum.

18Nomorewatchingfromthesidelines:AgenticAIputssupervisioninbusinesshands.

24II—ThenewAgentOpsstack:tests,guardrailsandfeedbackloops.

25Pre-productiontestingmustembracevariabilitytoensureagentreadiness.

35Guardrailsprotectoperationsbymanagingrisksduringagentexecution.

41Agentsupervisionspansfromimmediateruntimeactionstofutureplanningdecisions.

45III—SecureandaccelerateAgenticAIwith

standards&globalgovernance.

46Technicalteamsneedclearstandardstobuildanddeployagentsefficientlyandresponsibly.

50Scalingmulti-agentsystemsrequiressharedprotocolsforinteroperabilityandmanageability.

55BusinessteamsneedtoorganizeglobalAIgovernanceandsupervisionprotocols.

Conclusion

RFΛCT7

INTRODUCTIONTHEFUTUREOFAGENTICSUPERVISION

Introduction

If,asshowninourpreviousstudy,thefutureofworkwithAIliesinsupervisingAIagents,thenitisessentialtoensurethatthisnewformofworkbecomesabetterexperiencethanthecognitivetasksitreplaces.Manu-allyoverseeingeverystepanddecisionmadebyanagentwouldquicklybecomeatedious,evenmoredrainingtaskthansolvingtheproblemdirectlyourselves.So,howcanwedobetter?Thisstudyexploreswhat’strulyatstakeinagenticsupervisionandhowearlytoolsarebeginningtoshapewhatthisnewtypeofworkmightlooklike.

Wetakeabroadviewofwhatsupervisionmeans.Itstartswithsettingupautomatedloggingandtracingsystems.Italsoinvolvesdesigningevaluationandalert-ingframeworksthatguidethefinalandmostvisiblestep:takingaction(manuallycorrectingmistakes,relaunchinganagentictaskwithbettercontext,mitigatingincidents,identifyingareasforimprovement,andprioritizingde-velopmentefforts).Supervisingagentsmirrorsmanyaspectsofhumancollaboration:definingjobdescriptions(agentobjectives),recruiting(designinganddeployingnewagents),trainingandcoaching(monitoringandup-

datingbehavior),andongoingcollaboration(providingin-putsandsupporttoagents,butalsolearningfromagentsandthebusinesscontexttheycollectintheirmemory).

Webelievethatthesupervisionofasingleagentwillnotfalltojustoneperson.Agenticsupervisionisinherentlymultidimensional.Forinstance,businessoperationsmayoverseerelevanceandaccuracy;ethicsteams,compli-anceandtone;businessleaders,valueandeconomicviability;andcybersecurityteams,safetyandmaliciousattackriskmitigation.

Thisstudyfocusesonbestpracticesforagenticgov-ernance,supervisionprocesses,andthesupportingtools.Whilethisdomainisstillemergingandlikelytoevolvesignificantly,wealsoobservestrongcontinuitywithestablishedpracticesfromsoftware,RPA,data,andMLsupervision.DespitetheuniquechallengesposedbytheprobabilisticbehaviorofAIagents,manystablefoundationsalreadyexist.Embracingthesefoundationsnowiscriticaltoensuringthesuccessofearlyagenticinitiatives.

GeneratedwithChatGPT

8RFCT

THEFUTUREOFAGENTICSUPERVISION

AgenticAIrisksareshakingupthetechgovernance&

supervisiongame.

I.A

—AgenticAIorWhenSoftwareStartstoThink.

14I.B—NewTech,OldProblems:WhyGovernanceIsaContinuum.

18I.C—Nomorewatchingfromthesidelines:AgenticAIputssupervisioninbusinesshands.

IAGENTICAIRISKSARESHAKINGUPTHETECHGOVERNANCE&THEFUTUREOFAGENTICSUPERVISIONSUPERVISIONGAME.

I.AAgenticAIorWhenSoftwareStartstoThink.

AIagentsradicallydifferfromsoftware:theyareautonomousandgoal-driven.

Traditionalsoftwarefollowspredeterminedlogic,andchat-botsoperatewithinrigidtemplatesanddeterministicdeci-siontrees.Incontrast,agenticAIsystemsgomuchfurther:theyinterpretcontext,planactions,andexecutetasksbychainingdecisionsacrossvarioustoolsandAPIs.Theseagentsdon’tsimplywaitforusercommands,theypursueobjectives,evaluateintermediateoutcomes,andadjusttheirstrategiesonthefly.Thisautonomousreasoningmakesthemfeellessliketoolsandmorelikecollaborators.UnlikeRPAbots(RoboticProcessAutomation)orevenstandalonelargelanguagemodels(LLMs),agenticAIsys-temsaregoal-orientedandtask-complete,builttoachieveanoutcome,notjustfollowinstructionsorgeneratethemostlikelynextresponsetoaprompt.

Thismarksafundamentalshiftinthesoftwaredevelop-mentparadigm.Insteadofhardcodinglogicupfront,youdefinegoalsandsetconstraintsandtheagentautono-mouslyconstructsitsownplan.Itmaychainprompts,callAPIs,search&querydatastores,orevencreatesubgoalsasneeded.Ratherthanfollowingafixedpath,thesystemcontinuouslyadaptsitsactionstowhat’smostlikelyto

succeed.Whilethisopensthedoortomajorproductiv-itygains,italsodisruptstraditionalgovernancemodels:Howdoyoutestasystemwhoseoutputschangewitheveryrun?Howcanyoucontrolbehaviorthatvariesovertime,withoutresortingtoconstanthumanoversightandintervention?

“What’sdifferentwithagentsisthattheydon’tjustfollowascript.Theyinterpretinstructions,decidehowtoachievegoals,andofteninfermorethanyoutoldthemto.Thatopensupanewlayerofunpredictability.You’renotsuper-visingcode,you’resupervisingintent.”

ArthurGRENIER

ITChiefofStaff&SeniorDataScientist

ARDIAN

IAGENTICAIRISKSARESHAKINGUPTHETECHGOVERNANCE&THEFUTUREOFAGENTICSUPERVISIONSUPERVISIONGAME.

AgenticAIcan’tbemade100%predictableandcallsforgovernancereinventiontobalancevalueandrisks.

Thefirstgenerationofautomationtools,includingRPA,macrosandrule-basedbots,offeredpredictabilitybyde-sign.Theymimickeduseractionsstepbystep,withinwell-definedworkflows.EventraditionalMachineLearningsystems,despitetheirinternalcomplexityandprobabilisticnature,operatedwithinclearboundaries:structuredinputsandoutputs.Incontrast,LLMsacceptunstructuredtextinputsandcangenerateawiderangeofoutputs,ofteninunpredictableformats.AgenticAIexacerbatesbehaviorcomplexityevenfurther,agentsnavigatedynamicenviron-ments,drawonmultipleknowledgesources,andadapttheiractionsautonomouslyinrealtime.Theirbehaviorisinfluencednotjustbytrainingdataorpredefinedrules,butbyhumanprompts,toolusage,memorystate,andimplicitknowledgebakedintotheirfoundationmodels.

Legacygovernancemodelsreliedondeterministicin-put-outputcontrol:supplytestdata,verifyresults,tracebugs.Butagenticsystemsblurthatline.Asinglepromptmightleadtohallucinations,multipleAPIcalls,toolinterac-tions,ormemoryrecalls,allpotentiallydifferenteachtime.Thisabstractionbetweenintentandexecutioncreatesagovernancecontrolgapintermsoftechnicalvisibility,pro-cessreadinessandaccountability:rulescanbebypassed,edgecasesoverlooked,andbehavioralregressionsmaygounnoticeduntiltheycauserealissues.

Asaresult,supervisingagentsshiftstheeffortweightfromverifyingcodetoobservingpairsofinputsandoutputs,andpiecingtogethertheirdecision-makingret-rospectively.Asforsoftwareanddatamanagement,thisobservation&analysisefforthappensbothoffline,beforedeploymentongroundtruthorsyntheticdata,andonlineonproductiondata.Allintervieweesstressedtheimportanceofsettingupagenticsupervisionupfronttorigorouslytestagentswhilebeingdevelopedbutalsotoanticipateonlinesupervisionaccountabilityandreme-diationprocesses.

“Unliketraditionalsoftware,AIdevelopmentisfundamentallyprobabilistic.CodeisnolongerthecoreIP,learningis.Whatmattersisknow-ingwhatworks,whatdoesn’t,andwhy.”

ChrisVanPelt

Co-founder&CISO

10ΛRFCTRFΛCT11

IAGENTICAIRISKSARESHAKINGUPTHETECHGOVERNANCE&THEFUTUREOFAGENTICSUPERVISIONSUPERVISIONGAME.

Thisunpredictabilityshiftintroducestheneedforlarge-scale,statisticalvalue&riskevaluation.

Asaconsequenceofthisunpredictability,theemergenceofagenticAIhasintroducedaprofoundcontrolchallenge:traditionalQA(QualityAssessment)methodsarenolongeradequate.Previously,ahandfulofunittestsmatchingfixedinputstotheirexpecteddeterministicoutputswasenoughtovalidatehardcodedlogic.Incontrast,AIagentsnowrequiretestingacrossabroadspectrumofpossibleinputs,witheachtestscenariorigorouslyandrepeatedlyruntoaccountfortheirnon-deterministicbehavior.Ontopofthat,evaluatingtheirperformancemeansinterpretingun-structuredandvariabletextoutputs,whichmakesitmuchhardertoconsistentlydefineandmeasurewhat“quality”reallymeans.Outputqualitymayneedtobeassessedalongmultipledimensions,includingfactualaccuracy,completeness,security,andalignmentwithuserintent.

Oncequalityisassessed,asecondchallengeemerges:identifyingtherootcausesofagentfailurestosupportim-provementormanagerunincidents.Thisrequiresdetailed,transparentloggingoftheagent’sreasoningprocess,accessibletoadiversesetofsupervisingstakeholders;developers,complianceofficers,businessowners,anddomainexpertsalike.

“Theneedtoclosethissupervisionandgovernancegaprisesveryearlyintheenterpriseagenticjourney.”

Theneedtoclosethissupervisionandgovernancegaprisesveryearlyintheenterpriseagenticjourney.Asagenticsystemsbegininterpretingcomplexbusinesscontextsandtakingautonomousdecisions,therisksandresponsibilitiesgrow.Whileagentsarealreadybeingdeployedinenterprisepilotsacrossvariousfunctions,thetechnical,organization-al,andlegalinfrastructuresrequiredforrobustsupervisionremainunderdeveloped.Legacygovernanceframeworksareinsufficientandenterprisesneedtoupgradeitwithanew,test-intense,purpose-builtapproach.

“AftertheDigitalandMobilerevolutions,wearenowenteringathirdwaveofmediadisrup-tion:AIagents.Theseagentswillincreasinglymediateourinteractionswithcompanies,

transforminghowwesearch,learn,shop,

work,andcommunicate.Imaginethatin2030,40%ofinteractionsbetweenconsumersandcompanieswillbeshapedbyAI.Buthowdowecontrolthereliabilityandsecurityrisksoftheseagents?”

AlexCOMBESSIE

Co-founder&Co-CEO

}PGiskard

12ΛRFCT

IAGENTICAIRISKSARESHAKINGUPTHETECHGOVERNANCE&THEFUTUREOFAGENTICSUPERVISIONSUPERVISIONGAME.

TECHNOLOGY

Giskardisanopen-sourcetestingplatformdesignedtoensurethequality,security,andcomplianceofAImodels.Itautomatesthedetectionofvulnerabilitiessuchashallucinations,biases,andsecurityflawsinLLMsandagents.Giskard’sfeaturesincludeautomatedtestgeneration,continuousmonitoring,andcollaborativetoolsthatfacilitatecross-functionalteamworkamongdatascientists,developers,andbusinessstakeholders.

FEATURECOVERAGE

Eliability,Regulatorycompliance,Security,FinOps,Latency

OBSERVE.

Giskarddoesnotofferreal-timeob-servabilityfeaturessuchastrackinglatency,tokenusage,orcostmet-rics.Itsprimaryfocusisonpre-de-ploymenttestingandvulnera

人人文庫> 全部分類> 應用文書 > 研究報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責。
6. 下載文件中如有侵權或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision

文檔簡介

溫馨提示

最新文檔

評論

監(jiān)管智能體：Agentic+Supervision的未來研究報告+The+Future+of+Agentic+Supervision

文檔簡介

溫馨提示

最新文檔

評論

相關文檔