Reputation
Badges 1
25 × Eureka!Okay verified, it's the 'Agg' backend
Why would that require refactoring ? Dataset class should take care if it internally ,no?
The reason my_name is a subproject , is that so every version could be a "Task" inside that project , just easier to manage (or at least that was the idea)
This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?
can you tell me what the serving example is in terms of the explanation above and what the triton serving engine is,
Great idea!
This line actually creates the control Task (2)clearml-serving triton --project "serving" --name "serving example"
This line configures the control Task (the idea is that you can do that even when the control Task is already running, but in this case it is still in draft mode).
Notice the actual model serving configuration is already stored on the crea...
Hi IntriguedRat44
You can make log it offline (i.e. into a local folder/zip) by calling:Task.set_offline(True)You can also set the environment variable:TRAINS_OFFLINE_MODE=1You could also just skip the Trains.init call π
Does that help?
Does a pipeline step behave differently?
Are you disabling it in the pipeline step ?
(disabling it for the pipeline Task has no effect on the pipeline steps themselves)
Hi SillyPuppy19
I think I lost you half way through.
I have a single script that launches training jobs for various models.
Is this like the automation example on the Github, i.e. cloning/enqueue experiments?
flag which is the model name, and dynamically loading the module to train it.
a Model has a UUID in the system as well, so you can use that instead of name (which is not unique), would that solve the problem?
This didn't mesh well with Trains, because the project a...
. Perhaps it is the imports at the start of the script only being assigned to the first task that is created?
Correct!
owever when I split the experiment task out completely it seems to have built the cloned task correctly.
Nice!!
preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.
Okay that's the part that I'm missing, how come in the first run the package existed and in the cloned Task they are missing? I'm assuming agents are configured basically the same (i.e. docker mode with the same network access). What did I miss here ?
WickedGoat98
The webUI will look like the demo server πhttps://demoapp.trains.allegro.ai/
2. curl http://server-ip:8008 should return something like:{"meta":{"id":"78a9dc77081348e2930d1f429fd7e092","trx":"78a9dc77081348e2930d1f429fd7e092","endpoint":{"name":"","requested_version":1.0,"actual_version":null},"result_code":400,"result_subcode":0,"result_msg":"Invalid request path /","error_stack":null},"data":{}}%3. curl http://server-ip:8080 should return something like:
` <!d...
It completed after the max_job limit (10)
Yep this is optuna "testing the water"
Hi ResponsiveCamel97
The agent generates a new configuration file to be mounted into the docker, with all the new folders as they will be seen inside the docker itself. One of the changes is the system_site_packages as inside the docker we want the new venv to inherit everything from the docker system installed packages.
Make sense ?
You mean parameters of the pipeline? Is this a pipeline from Tasks or from function decorator?
Hi SkinnyPanda43
Let's say that I install the shared libs with pip in editable mode on my development evironment, how does the clearml-agent will handle those libraries if I submit a job
So installing packages from local folders with "-e" is in general ill-advised.
But using a full git path should work out of the box. for example if you install pip install https://github.com/user/repo/repo.git then the agent will be able to install it on the remote machine. The main challenge...
Mmm well, I can think of a pipeline that could save its state in the instant before the error occurred.
This is already the case, if you clone the pipeline Task change the Args/_continue_pipeline_ to True and enqueue
Is there a helper function option at all that means you can flush the clearml-agent working space automatically, or by command?
Every Task execution the agent clears the venv (packages are cached locally, but the actual venv is cleared). If you want you can turn on the venv cache, but there is no need to manually clear the agent's cache.
Hi SourOx12
I think that you do not actually need this one:step = step - cfg.start_epoch + 1you can just dostep += 1ClearML Will take care of the offset itself
If you are using the latest RC:pip install clearml==0.17.5rc5You can pass True it will use the "files_server" as configured in your clearml.conf
I used the http link as a filler to point to the files_server.
Make sense ?
Ohh I see, makes total sense. I'm assuming the code base itself can do both π
Is there a way to move existing pipelines between projects?
You should be able to, go to your settings page and turn on "show hidden folders"
Then go to your project, you should see " .pipeline " sub project there, right click it and move it to another folder.
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
seems it was fixed π
MagnificentWorm7 thank you for noticing ! π
When a remote task runs
Dataset.get()
it is not using the correct URL
BoredHedgehog47 it will get the link the data was Registered with, when creating the Dataset.
This has Nothing to do with the local configuration, it can point to any arbitrary file location on the internet.
It was created there, because at the time of the dataset creation someone (manually or via the config) set a specific host as the file location, and to that host the files were uploaded (again ...
IntriguedRat44 how do I reproduce it ?
Can you confirm that marking out the Task.init(..) call will fix it ?
SmallBluewhale13
And the Task.init registers 0.17.2 , even though it prints (while running the same code from the same venv) 0.17.2 ?
It was set to true earlier, I changed it to false to see if there would be any difference but doesnβt seem like it
I would actually just add:Task.add_requirements('google.cloud')Before the Task.init call (Notice, it has to be before the the init call)
that is odd..
So if you have 3 agents, how many concurrent experiment are they running ? (actually running, not registered as running)