Reputation
Badges 1
25 × Eureka!could you try this one:frameworks = { 'tensorboard': True, 'pytorch': False }
This would log the TB (in the BKG), but no model registration (i.e. serial)
Oh I see, these are to secure your server (basically we recommend you replace the default key/secret π )
Make sense ?
link to the line please π
Hi LudicrousParrot69
A bit of background:
A Task is a job executed in the system (sometime it is an experiment training, sometime a controller like the pipeline). Basically everything process can be a task.
Specifically the pipeline controller itself (i.e. the process running the Bayesian optimization) is Task in the system (i.e. a job running). What it does (using the HyperParameterOptimizer) is cloning previously executed Tasks (e.g. training experiments), change their parameters and moni...
So there is a hack for it:CLEARML_OFFLINE_MODE=1 python3 my_main.py
Which is the same as calling Task.set_offline
Then inside the code After the Task.init call:
` task = Task.init(...)
not sure what the if here is?!
Task.debug_simulate_remote_task(task_id="offline-1") `This will make things act as if this is running remotely , i.e. your logic Task.running_remotely() will be called.
Do notice that in remote mode, all the arguments / data is read from the clearml-server into the cod...
Hi DilapidatedDucks58 just making sure, the link is pyrorch nightly artifactory? Or is it a direct link to the package? Reason for asking, I was not aware they have proper artifactory... When the task runs the trains agent will update the installed packages with all the installed packages it used. Could you verify you have the correct version?
Regarding the extra files, you are correct, the docker container is reset every run, so they will get lost. What are those files for? Could you add ...
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
or by trains
We just upload the image as is ... I think this is SummaryWriter issue
The fact is that I use docker for running clearml server both on Linux and Windows.
My question was on running the agent, is it running with --docker
flag, i.e. docker mode
Also, just forgot to note, that I'm running clearml-agent and clearml processes in virtual environment - conda environment on Windows and venv on Linux.
Yep that answers my question above π
Does it make any sense to chdngeΒ
system_site_packages
Β toΒ
true
Β if I r...
shows that the trains-agent is stuck running the first experiment, not
the trains_agent execute --full-monitoring --id a445e40b53c5417da1a6489aad616fee
is the second trains-agent instance running inside the docker, if the task is aborted, this process should have quit...
Any suggestions on how I can reproduce it?
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
The latest image seems to require drivers on the host 460+
try this one:
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-12.html#rel_20-12
You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
The additional edges in the graph suggest that these steps somehow contain dependencies that I do not wish them to have.
PanickyMoth78 I think I understand what you are saying, but it is hard to see if there is a "bug" here or a feature...
Can you post the full code of the pipline?
Are you saying this component should pull a specific git repo?PipelineDecorator.component( ..., )
seems like there is no reference to a specific repo (arguments repo
and repo_branch
etc are missing) is that correct?
Hi MagnificentSeaurchin79
Unfortunately there is currently no way to reorder the plots, but you have a valid point. I suggest a GitHub UX issue ?
Regrading the debug samples, the difference is that the confutation matrix report is actually metadata, you can get these numbers by the API or the download, but the debug samples are static images ...
BTW: you can try to produce an interactive side by side confusion matrix with plotly, and use report_plotly_figure
Hi ChubbyLouse32
If I understand correctly you can relatively easy take a clearml Task and launch it on LSF, an integration would be something like:
` from clearml import Task
from clearml.backend_api.session.client import APIClient
while True:
result = client.queues.get_next_task(queue=q_id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
here is where we create the LSF job, this is just a pseudo code
os.system("lsf-launch-cmd 'clearml...
However, when we try to access the webapi from remote through the VPN we fail. The VPN logs don't show any blockage. Any ideas?
Maybe the VPN firewall blocks http connections ? or it might be BrightRabbit75 case, that sounds quite logical to never show anywhere
Honestly, this is all related to issue #340.
makes total sense.
But actually this id different from #340. The feature is to store the Data on the Task, this means each Task in your "pipeline" will be upload a new copy of the data. No?
I'd suggest someΒ
task.detach()
Β method for remote execution maybe
That is a good idea, in theory it can also be used in local execution
Hi DilapidatedDucks58
trains-agent tries to resolvethe torch package based on the specific cuda version inside the docker (or on the host machine is if used in virtual-env mode). It seems to fail finding the specific version "torch==1.6.0.dev20200421+cu101"
I assume this version was automatically detected by trains when running manually. If this version came from a private artifactory you can add it to the trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L...
MotionlessCoral18 I think there is a fix in the latest clearml-agent RC 1.4.0rc0 can you test and update if your are still having this issue?
JitteryCoyote63 are you suggesting it happens ?
(obviously it should not π )
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
π
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar
and you have your credentials in the browser when deleting a Task ?
But this is not copy, this is mount, your log showed cp failing
Hi AstonishingRabbit13
is there option to omit the task_id so the final output will be deterministic and know prior to the task run?
Actually no π the full path is unique for the run, so you do not end up overwriting models.
You can get the full path from the UI (Models Tab) or programmatically with Models.query_models or using the Task.get_task methods.
What's the idea behind a fixed location for the model?
PlainSquid19 yes the link is available on in the actual paid product π
I don't think they have the documentation open yet...
My recommendation is to fill the contact us form, you'll get a free online tour as well π