![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/WackyRabbit7.png)
Reputation
Badges 1
533 × Eureka!I assume that at some points in the execution, the client (where the task is running) is sending JSONs to the mongo service, and that is what we see in the web UI.
Since we are talking about a case where there is no internet available, maybe these could be dumped into files/stdout and let the user manually insert them.
The manual insertion UX could be something like a CLI copy-paste or and endpoint for files - but since your UX is so good ( 🙂 ) I'm sure you'll figure this part out better
Very nice thanks, I'm going to try the SA server + agents setup this week, let's see how it goes ✌
pgrep -af trains
shows that there is nothing running with that name
I never installed trains on this environment
I'm really confused, I'm not sure what is wrong and what is the relationship between the templates the agent and all of those thing
For the meantime, I'm giving up on the pipeline thing and I'll write a bash script to orchestrate the execution, because I need to deliver and I'm not feeling this is going anywhere
On an end note I'd love for this to work as expected, I'm not sure what you need from me. A fully reproducible example will be hard because obviously this is proprietary code. What ...
it seems that only the packages that are on the script are getting installed
but the task pending says its in the queue
Especially coming from the standpoint of a team leader or other kind of supervision (or anyone who wants to view the experiment which is not the code author), when looking at an experiment you want to see the actual code
it's double weird, because also a task that the pipeline says is "in progress" is actually completed
So prior to doing any work on the trains autoscaler servcice, I should first create a auto scaling group in AWS?
no this is from the task execution that failed
No I don't have trains anywhere in my code
how do I run this wizard? is this wizard train's or aws's?
sorry I think it trimmed it
` # define pipeline
pipe = clearml.PipelineController(
name=TASK_NAME,
project=PROJECT_NAME,
version='0.0.1',
add_pipeline_tags=False,
)
pipe.set_default_execution_queue('default')
Adding steps
pipe.add_step(name=f'{start_date_train}_{end_date_train}_choose_best',
base_task_project=CHOOSE_PROJECT_NAME,
base_task_name=CHOOSE_TASK_NAME,
parameter_override=params_override,
...
can't remember, I just restarted everything so I don't have this info now
Okay so regarding the version - we are using 1.1.1
The thing with this error it that it happens sometimes, and when it happens it never goes away...
I don't know what causes it, but we have one host where it works okay, then someone else checks out the repo and tried and it fails for this error, while another guy can do the same and it will work for him
I might, I'll look at the internals later cause at a glance I didn't really get the logic inside get_local_copy
... the if
there is ending with if ... not cached_file: return cached_file
which from reading doesn't make much sense
Cool, now I understand the auto detection better
How did it come to this? I didn't configure anything, I'm using the trains AMI, with the suggested instance type
And yes, it makes perfect sense, thanks for the answer
Yep, if communication is both ways, there is no way (that I can think of) it can be solved for offline mode.
But if the calls that are made from the server to the client can be redundant in a specific setup (some functionality will not work, but enough valuable functionality remains) then it is possible in the manual way