Reputation
Badges 1
25 × Eureka!Hi GreasyPenguin14
It looks like you are trying to delete a Task that does not exist
Any chance the cleanup service is misconfigured (i.e. accessing the incorrect server) ?
DeliciousBluewhale87 could you restart the pod and ssh to the Host and make sure the folder /opt/clearml/agent
exists and there is not *.conf file in it ?
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?
Hi MistakenDragonfly51
Hello everyone! First, thanks a lot to everyone that made ClearML possible,
❤
To your questions 🙂
long story short, no unless you really want to compile the dockers, which I can't see the real upside here Yes, add the following /opt/clearml.conf:/root/clearml.conf
herehttps://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L154
and configure your hosts " /opt/clearml.conf"
with ...
The easiest would be as an artifact (I think).
Let's assume you put it into a csv file (with pandas or mnaually)
To upload (from the pipeline Task itself):task.upload_artifacts(name='summary', artifact_object='~/my/summary.csv')
Then if you want to grab it from anywhere else:task = Task.get_task(task_id='HPO controller Task id here') my_csv = Task.artifacts['summary'].get_local_copy()
If you want to store as dict it might be even easier:
` task.upload_artifacts(name='summary', artifa...
Hi JealousParrot68
clearml tracking of experiments run through kedro (similar to tracking with mlflow)
That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?
I think the challenge here is to pick the right abstraction matching. E.g. should a node in kedro (w...
I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
Who/What created the initial experiment ?
I noticed that if I run the initial experiment by "python -m folder_name.script_name"
"-m module" as script entry is used to launch entry points like python modules (which is translated to "python -m script")
Why isn't the entry point just the python script?
The command line arguments are passed as arguments on the Args section of t...
Oh I see, what you need is to pass '--script script.py' as entry-point and ' --cwd folder' as working dir
So could you re-explain assuming my piepline object is created by
pipeline = PipelineController(...)
?
pipe.add_step(name='stage_train', parents=['stage_process', ], monitor_artifact=['my_created_artifact'], base_task_project='examples', base_task_name='pipeline step 3 train model', parameter_override={'General/dataset_task_id': '${stage_process.id}'})
This will put the artifact names "my_created_artifact" from the step Tas...
For classification it's F1 score but for other task it maybe and I don't think that's problem. we just have to log it right?
Correct 🙂
Give me few days, I will work on your sugestions and then let you know if I am not able to do this
Sounds good!
BTW:previous_tasks = Task.get_tasks(task_filter={'tags': 'best'}) local_model_file = previous_tasks[0].artifcats['my_model'].get_local_copy()
okay but still I want to take only a row of each artifact
What do you mean?
How do I get from the node to the task object?
pipeline_task = Task.get_task(task_id=Task.current_task().parent)
As you said you just need to clone, righr click clone?
That makes sense...
Basically in the open-source version the approach is everyone sees everything for maximum transparency (and also ease of use). I know there are access-roles in the paid tier and vault for exactly these types of things...
Where do you currently save them? and how do you pass them to the remote machine ?
Basically I think I'm asking, is your code multi-node enabled to begin with ?
(sure, we can try, conda is sometime flaky but is supported)
specify conda as the package manager:https://github.com/allegroai/trains-agent/blob/9a3f950ac689c50ba3415c42749a4bd8059e89a7/docs/trains.conf#L49
2. make sure trains-agent is install on both nodes
3. assuming you already have an experiment in the system, right click on the experiment and clone it. Then press on the ID button next to the experiment name, and copy the task ID
4. ssh to each node and run:
` trains-agent execute --id <...
This is assuming you can just run two copies of your code, and they will become aware of one another.
And do you need to run your code inside a docker, or is venv enough ?
JitteryCoyote63 to filter out 'archived tasks' (i.e. exclude archived tasks)Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(system_tags=["-archived"])))
could it be the polling on the Task (can't remember whats the interval), but it will update it's state once every X minutes/seconds
I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally
Yeah, that is way too much, I think relates to the frequency it updates the console 😞
This is exactly what I did here, and it is working 😞
https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution
SmugLizard25 are you saying that with the latest version it does not work?
I'm not sure on the frequency it updates though
Unfortunately not, the queues tab shows only the number of tasks, but not resources used
in the queue
Oh, yes, that makes sense to add, I like that 🙂
(the main question is what data is there in the backend DBs, let me know what I can get)
the time taken to upload halved. It is puzzling because as you say it's not that much to upload.
Maybe it was the load on the server? meaning dealing with multiple requests at the same time delayed the requests?!
For now I've whittled down the number of entries to a more select but useful few and that has solved the issue. If it crops up again I will try
connect_configuration
properly.
Thanks for your help!
My pleasure 🙂
One last question: Is it possible to set the pip_version task-dependent?
no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)