I mean this blob is then saved on the fs
It can if you do:temp_file = task.connect_configuration('/path/to/config/file', name='configuration object is a config file')
Then temp_file is actually a local copy of the text coming from the Task.
When running in manual mode the content of '/path/to/config/file' is stored on the Task When running remotely by the agent, the content from the Task is dumped into a temp file and the path to the file is returned in temp_file
Oh that is odd... let me check something
Hi SpicyLion54
the -f flag is not very stabe for pip (and cannot be added in requirements.txt). ClearML agent mwill automatically find the correct torch (from the torch repository) based on the cuda it detects in runtime.
This means it automatically translates torch==1.8.1 and will pull form the correct repo based on torch support table.
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image).
In the UI yes, in code you can do task.mark_aborted(force=True)
You do clear the previous run execution but I think for a repetitive task this is fine.
I would avoid that, no?
Should pass only_published:
https://github.com/allegroai/clearml/blob/071caf53026330f3bb8019ee5db3d039562072f3/clearml/model.py#L444
SmallBluewhale13 the final path is automatically generated, you only need to specify the bucket itself. By default it will be your "files_server"
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/docs/clearml.conf#L10
You can either change the configuration (which will make sure All uploaded artificats will always be there, including debug images etc.)
You can specify where you want the artifacts and debug images to be uploaded by setting:
https://allegro....
Can you see it on the console ?
Do you accidentally know if there are any plans for an implementation with the logger variable, so that in case of something it would be possible to write to different tables?
CheerfulGorilla72 what do you mean "an implementation with the logger variable" ? pytorch-lighting defaults to the TB logger, which clearml will automatically catch and log into the clearml-server, you can always add additional logs with clearml interface Logger.current_logger().report_???
What am I mis...
Hi VivaciousBadger56
Basically you can think of MLRun as "amazon lambda service without amazon". It is designed to run a "function" in scale on multiple nodes.
ClearML on the other hand is an MLOps platform. It does the experiment tracking, it orchestrates Task (think jobs), it does data management and lastly we recently released the serving. These are two different use cases.
Am I making sense here?
just want to be very precise an concise about them
Always appreciated π
did you run trains-agent
?
Hi ComfortableHorse5
Yes this is more of a suggestion that you should write them using the platform capabilities, the UI implementation is being worked on, as well as a few helpers classes, I thin you'll be able to see a few in the next release π
Full markdown edit on the project so you can create your own reports and share them (you can also put links to the experiments themselves inside the markdown). Notice this is not per experiment reporting (we kind of assumed maintaining a per experiment report is not realistic)
TroubledHedgehog16 if you have a preinstalled conda env then why would you need to reinstall it from yml file? Also if this is the default python env, clearml-agent will inherit from it and use i, (no real overhead there)
Notice the reason for "inheriting system" python environments is so that the agent could cache the individual Task requirements, meaning next time it will not need to reinstall anything
wdyt?
Could not find a version that satisfies the requirement pytorch~=1.7.1
Seems like pytorch 1.7.1 has no package for python 3.7 ?
EnviousPanda91 notice that when passing these arguments to clearml-agent you are actually passing default args, if you want an additional argument to Always be used, set the extra_docker_arguments
here:
https://github.com/allegroai/clearml-agent/blob/9eee213683252cd0bd19aae3f9b2c65939d75ac3/docs/clearml.conf#L170
Hmm, Notice that it does store sym links to parent data versions (to save on multiple copies of the same file). If you call get_mutable_local_copy() you will get a standalone copy
HappyDove3
see here https://github.com/allegroai/clearml-pycharm-plugin π
Hi UnsightlyHorse88
Hmm, try adding to your clearml.conf file:agent.cpu_only = true
if that does not work try adding to the OS environmentexport CLEARML_CPU_ONLY=1
If the right properties are set can the profile tab be added?
I guess that is doable, that said some of the graphs are not straight forward to support like this one:
https://www.tensorflow.org/guide/images/tf_profiler/trace_viewer.png
Notice that we are using the same version:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2
The reason was that previous version did not support torchscript, (similar error you reported)
My question is, why don't you use the "allegroai/clearml-serving-triton:latest" container ?
Yeah I think this is a UI bug, any chance you mind opening a GitHub issue ?
What is the Model url?print(model.url)
It seems like the naming Task.create a lot of confusion (we are always open to suggestions and improvements). ReassuredTiger98 from your suggestion, it sounds like you would actually like more control in Task.init (let's leave Task.create aside, as its main function is Not to log the current running code, but to create an auxiliary Task).
Did I understand you correctly ?
Hi WackyRabbit7 ,
Yes we had the same experience with kaggle competitions. We ended up having a flag that skipped the task init :(
Introducing offline mode is on the to do list, but to be honest it is there for a while. The thing is, since the Task object actually interacts with the backend, creating an offline mode means simulation of the backend response. I'm open to hacking suggestions though :)
ExcitedFish86
How do I set the config for this agent? Some options can be set through env vars but not all of themΒ
Hmm okay if you are running an agent inside a container and you want it to spin "sibling" containers, you need to se the following:
mount the docker socket to the container running the agent itself (as you did), basically adding " --privileged -v /var/run/docker.sock:/var/run/docker.sock
" Allow the host to mount cache and configuration from the host into the siblin...
MelancholyElk85 assuming we are running with clearml 1.1.1 , let's debug the pipeline and instead of pipeline start/wait/stop :
Let's do:pipeline.start_locally(run_pipeline_steps_locally=False)
Hi JitteryCoyote63
Somehow I thought it was solved π
1 ) Yes please add GitHub issue so we can keep track
2 )
Task.current_task().get_logger().flush(wait=True). # <-- WILL HANG HERE
Is this the main issue ?
@<1734020162731905024:profile|RattyBluewhale45> could you attach the full Task log? Also what do you have under "installed packages" in the original manual execution that works for you?
Right so this is checksum based?
correct
Are there plans to only store delta changes for files (i.e. store the changed byte instead of the entire file)?
Long story short, no π
Basically delta changes are not scaleable. and work only in text based files, see git, and breaks very quickly when large files are involved, see the fun of git-lfs ...
Does that make sense? is there a specific reason you are thinking about byte granularity ?