Reputation
Badges 1
25 × Eureka!Thanks JuicyFox94 for letting us know.
I'm checking what's the status with it
its should logged all in the end as I understand
Hmm let me check the code for a minute
Do people use ClearML with huggingface transformers? The code is std transformers code.
I believe they do π
There is no real way to differentiate between, "storing model" using torch.save and storing configuration ...
Hi PanickyMoth78
can receive access to a GCP project and use GKE to spin clusters up and workers or would that be on the customer to manage.
It does, and also supports AWS.
That said only the AWS is part of the open-source, but both are parts of the paid tier (I think Azure is in testing)
OutrageousSheep60
I found the task in the UI -
and in the
UNCOMMITTED CHANGES
execution section there is
No changes logged
This is the issue.
and then run the
session
via docker
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verboseAre you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?...
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way π
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65
Hi PricklyJellyfish35
My apologies this thread was forgotten π
What's the current status with the OmegaConf, ? (I'm not sure I understand what do mean by resolve=False)
Hi ScaryBluewhale66
TaskScheduler I created. The status is still
running
. Any idea?
The TaskScheduler needs to actually run in order to trigger the jobs (think cron daemon)
Usually it will be executed on the clearml-agent services queue/mahine.
Make sense ?
more like testing especially before a pipeline
Hmm yes, that makes sense.
Any chance you can open a github issue on it?
Let me see if I understand, basically, do not limit the clone on execute_remotely, right ?
When did this PipelineDecorator come. Looks interestingΒ
A few days ago (I think)
It is very cool! checkout the full object proxy interaction on the actual pipeline logic This might be better for your workflow, https://github.com/allegroai/clearml/blob/c85c05ef6aaca4e...
481.2130692792125 seconds
This is very slow.
It makes no sense, it cannot be network (this is basically http post, and I'm assuming both machines on the same LAN, correct ?)
My guess is the filesystem on the clearml-server... Are you having any other performance issues ?
(I'm thinking HD degradation, which could lead to a slow write speeds, which would effect the Elastic/Mongo as well)
Hi @<1532532498972545024:profile|LittleReindeer37>
Yes you are correct it should capture the entire jupyter notebook in sagemaker studio.
Just verifying this is the use case, correct ?
unless the domain is different
Β ?
Imagine that you are working with both github and bitbucket for example, if you are using git-ssh than git will know which of the domains to send the key to. Currently there is a single user/pass entry so all domains will get the same credentials. But I think this is a rare use case.
MagnificentPig49 I was not aware of jsonargparse from what I understand it's a nicer way to parse json configuration files, with argparser alike interface. Did I get that correctly?
Regrading the missing argparser, you are correct, the auto-magic is not working since jsonargparse is calling an internal ArgParser function and not the external one (hence we miss it).
The quickest fix is adding the following line before you call parse_args() :task.connect(parent_parser)
hmm that is odd.
Can you send the full log ?
(some packages that are not inside the cache seem to have be missing and then everything fails)
How did that happen?
Hi @<1628927672681762816:profile|GreasyKitten62>
Notice that in the github actions example this psecific Task is executed on the GitHub backend, the Task it creates is executed on the clearml-agent.
So basically:
Action -> Git worker -> task_stats_to_comment.py -> Task Pushed to Queue -> Clearml-Agent -> Task execution is here
Does that make sense ?
One suggestion is to make sure all agents have the same configuration. Another is to add pip into the "installed packages" section.
(Notice that in the next release we will specifically include it there, to avoid these kind of scenarios)
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one also in the global pipeline script (not just inside the function)
Sorry @<1689446563463565312:profile|SmallTurkey79> just notice your reply
Hmm so I know the enterprise version has a built-in support for slurm, which would remove the need to deploy agents on the slurm cluster.
What you can do is on the SLURM login server (i.e. a machine that can run sbatch), write a simple script that pulls the Task ID from the queue and calls sbatch with clearml-agent execute --id <task_id_here> , would this be agood solution
PompousBeetle71 could you check that the "output:destination" is the same for both experiments ?
oh, if this is the case, why not use the "main" server?
and this
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/{server_info['base_url']}/"
Iβm not sure if this was solved, but I am encountering a similar issue.
Yep, it was solved (I think v1.7+)
With
spawn
and
forkserver
(which is used in the script above) ClearML is not able to automatically capture PyTorch scalars and artifacts.
The "trick" is to have Task.init before you spawn your code, then (since your code will not start from the same state), you should call Task.current_task(), which would basically make sure everything is...
BTW: any specific reason for going the RestAPI way and not using the python SDK ?