Reputation
Badges 1
25 × Eureka!Hi CluelessElephant89
hey guys, I believeΒ
clearml-agent-services
Β isn't necessary right?
Generally speaking, yes you are corrected π
Specifically, this is the "services" queue agent, running your pipeline logic, services etc.
But it is not a must to get the server to work, and you can also spin it on a different host
Hi OutrageousGrasshopper93
When the Task is executed on a worker, the presence of spaces breaks the URLs and from the UI I cannot access to the resources on the bucket
You are saying the URLs generated in a remote execution are "broken" and on local execution are working, even though it is the same project/task name ?
Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?
BTW is it cheaper than ec2 instance? Why not use the aws autoscaler ?
I think my main point is, k8s glue on aks or gke basically takes care of spinning new nodes, as the k8s service does that. Aws autoscaler is kind of a replacement , make sense?
so the thing with IAM roles, they are designed to allow AWS instances to get "automatic" permission (based on the IAM role). They are not actually designed to generate key/secret as I think the lifetime is be default relatively short. Since the actual request to the S3 comes from the client browser (i.e. outside of AWS cluster) the IAM role cannot apply, and you have to provide the key/secret. The easiest way is to generate S3 keys regardless of the IAM roles, to be used with the clients (sp...
Basic setup:
glues service per "job template" (e.g. k8s resources, for example cpu requirement, or gpu requirement).
queue per glue service, e.g. cpu_machine
queue, and 1xGPU
queue
wdyt?
JitteryCoyote63
IAM role to the web app could access
you mean the web client key/secret to access S3 data ?
TrickySheep9 Yes, let's do that!
How do you PR a change ?
And can I store models with no attachment to tasks?
Assuming you have the Model ID :model = InputModel(model_id='aabbcc') local_file_or_folder = model.get_weights()
Is this what you are looking for?
You mean like for your internal support channel inside your company ?
LazyTurkey38 notice the assumption is that the docker entry-point ends with bash, and only then the agent take charge. I'm assuming this is not te case hence the agent spins the docker, then the docker just ends, could that be?
VexedCat68
But what's happening is, that I only publish a dataset once but every time it polls,
this seems wrong (i.e a bug?!), how do you setup the trigger ? is the Trigger Task constantly running or are you re-launching it?
https://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console
Hmm try to set this one before spinning the agent
Windowsset PYTHONIOENCODING=:replace
Inside Colabos.environ["PYTHONIOENCODING"] = ":replace"
but i still think the same should be possible using the Task.init
This is the part the I find confusing:Task.init(..., output_uri=True)
is working for me, what is that setup that caused this line to "fail"?
Hi JitteryCoyote63
Or even better: would it be possible to have a support for HTML files as artifacts?
If you report html files as debug media they will be previewed, as long as the link is accessible.
You can check this example:
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
In the artifacts, I think html are also supported (maybe not previewed as nicely but clickable.
Regrading the s3 link, I think you are supposed to get a popup window as...
Could be nice to write some automation
Hi AntsySeagull45
Any chance the original code was running with python2?
Which version of trains-agent are you using?
how would I get an agent to launch in the same instance of my clearml server
Actually that is my point, you do not have to spin the agent on the clearml-server instance. We added the services agent as part of the docker-compose for easier deployment, that said you can always manually SSH to the server, or run on any other machine, like you would spin any other clearml-agent
.
Does that make sense ?
Is trains-agent using docker-mode or virtual-env ?
Please attach the log π
Hi SolidSealion72
"/tmp" contained alot of artifacts from ClearML past runs (1.6T in our case).
How did you end up with 1.6TB of artifacts there? what are the workflows on that machine? at least in theory, there should not be any leftover in the tmp folder, after the process is completed.
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
Hi @<1523701079223570432:profile|ReassuredOwl55> let me try ti add some color here:
Basically we have to parts (1) pipeline logic, i.e. the code that drives the DAG, (2) pipeline components, e.g. model verification
The pipeline logic (1) i.e. the code that creates the dag, the tasks and enqueues them, will be running in the git actions context. i.e. this is the automation code. The pipeline components themselves (2) e.g. model verification training etc. are running using the clearml agents...
Hi FiercePenguin76
should return all datasets from all projects?
Correct π
VivaciousPenguin66 I have the feeling it is the first space in the URI that breaks the credentials lookup.
Let's test it:from clearml import StorageManager uri = '
` Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
original
StoargeManager.get_local_copy(uri)
qouted
StoargeManager.get_local_copy(uri.replace(' ', '%20')) `
Hi ColossalAnt7
Try ctrl-F5 and refresh the page?!
It seems you are missing a few buttons π
Hi PompousBeetle71 , what exactly is the scenario / problem we are trying to solve ?
current task fetches the good Task
Assuming you fork the process than the gloabl instance" is passed to the subprocess. Assuming the sub-process was spawned (e.g. POpen) then an environement variable with the Task's unique ID is passed. then when you call the "Task.current_task" it "knows" the Task was already created and it will fetch the state from the clearml-server and create a new Task object for you to work with.
BTW: please use the latest RC (we fixed an issue with exactly this...