This is already part of the docker-compose file,
https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml
MoodyCentipede68 is diagram 2 a batch processing workflow?
I want that last python program to be executed with the environment that was created by the agent for this specific task
Well basically they all inherit the Python environment that points to the venv they started from, so at least in theory it should be transparent when the agent is spinning the initial process.
I eventually found a different way of achieving what I needed
Now I'm curious, what did you end up doing ?
Hi @<1523701079223570432:profile|ReassuredOwl55> let me try ti add some color here:
Basically we have to parts (1) pipeline logic, i.e. the code that drives the DAG, (2) pipeline components, e.g. model verification
The pipeline logic (1) i.e. the code that creates the dag, the tasks and enqueues them, will be running in the git actions context. i.e. this is the automation code. The pipeline components themselves (2) e.g. model verification training etc. are running using the clearml agents...
None of them is problematic, this is what I'm trying to say đ
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
MysteriousBee56 yes, please change the trains code!!! Wee pee, if you think someone else can benefit, feel free to PR :)
Regrading the double entry, that seems like an odd bug, how can I reproduce it?
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?
you mean wait for less?
None
add to your clearml.conf:
api.http.retries.backoff_factor = 0.1
Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81
ElegantKangaroo44 my bad đ I missed the nuance in the description
There seems to be an issue in the web ui -> viewing plots in "view in experiment table" doesn't respect the "scalars to display" one sets when viewing in "view in fullscreen".
Yes the info-panel does not respect the full view selection, It's on the to do list to add this ability, but it is still no implemented...
Hi CleanPigeon16
I was wondering how (or if) you handle interruptions.
Good question, basically (and I might be missing a few details but I think that's the general gist).
A new instance will be spinned (spot/regular based on your "compute budget") as long as there is a job in the "monitored" queue. that mean that if a worker was kicked by amazon (i.e. is spot) another one will be spinned instead as long as there is a job in the queue. That means that what is probably missing in you...
Hi EcstaticPelican93
Sure, the model deployment itself (i.e. the serving engine) can be executed on any private network (basically like any other agent)
make sense ?
Hi GiganticTurtle0
The main issue is the cache=True
it will cause the second time you call the function to essentially reuse the Task, ending with the same result.
Can you test with cache=False
in the decorator ?
Hi SmallDeer34
The any generally any pytorch.save(...) is logged/uploaded by clearml
automatically. specifically in your case I think the only missing one is the trainer_sate.json, which I assume is general json file, and I imagine is part of huggingface framework. You can easily upload it as additional artifact with Task.upload_artifact
wdyt?
So two folders with artifacts per experiment. I was wondering if there was a more efficient solution and if it could be combined.
Not sure I follow, is two subfolders for two different things are not they it is supposed to be ?
Why is it using an OutputModel and an InputModel?
So calling OutputModel will create the new Model entity and upload the data, InputModel will store it as required input Model.
Basically on the Task you have input & output section, when you clone the Task you are copying the input section into the newly created Task, and the assumption is that when you execute it, your code will create the output section.
Here when you clone the Task you will be clone the reference to the InputModel (i...
Hi MelancholyElk85
Can I manually deleteÂ
.zip
 files with datasets inÂ
.clearml/cache/storage_manager/datasets
 directory?
Yes, you can. I "think" the .zip is stored for easier access, but you can delete it, as long as the "extracted" folder exists, it should be fine.
Hi LazyTurkey38
, is it possible to have the agents keep a local version and only download the diff of the job commit to speed things up?
This is what it does, it has a local cached copy and it only pulls the latest changes
I'm not sure about the intended use of
connect_configuration
now.
Basically here is the rationale behind it:
I have a config file that I want to log on the Task, and I Also want to be able to change this configuration file externally when launching using an agent (i.e. edit the content) I have a nested dictionary that I do not want to flatten and push as hyper-parameters because it is not very readble, so I want to store it in a more human readable form and edit it a...
SoreDragonfly16 the torchvision warning has nothing to do with the Trains
warning.
The Trains warning means that somehow someone changes the state of the Task from running (in_progress) to "stopped" (aborted). Could it be one of the subprocesses raised an exception ?
files_server:
://genuin-ai/
should be:
files_server:
I always have my notebooks in git repo but suddenly it's not running them correctly.
What do you mean?
Can I switch off git diff (change detection?)
Yes, Task.init(..., auto_connect_frameworks={"detect_repository": False})
Hmm, you can delete the artifact with:task._delete_artifacts(artifact_names=['my_artifact']
However this will not delete the file itself.
Do delete the file I would do :remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']
Maybe we should have a proper interface for that? wdyt? what's the actual use case?
I am struggling with configuring ssh authentication in docker mode
GentleSwallow91 Basically the agent will automatically mount the .ssh into the container , just make sure you set the following in the clearml.conf:force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L30
. So i'd like to use the command line argument it in the first argparse, and then hide/delete/override before running the second argparse.
Nice, hack!
task.project
is the project ID (not name)task.get_project_name()
will return the project name
Why would you need to manually change the current run? you just provided the values with either default/command-line ?
what am I missing here?
ResponsiveHedgehong88 I'm not sure I state dit, but the argparser arguments and values are collected automatically from your current run and put on the Task, there is no need to manually set them if you have the argparser running on your machine. Basically it collects the current (i.e. the process running on your machine) settings, and "copies" them ...